DOCUMENT RESUME 

ED 324 977 FL 018 991 



AUTHOR 
TITLE 

INSTITUTION 
SPONS AGENCY 
PUB DATE 
NOTE 

PUB TYPE 



Stansfield, Charles W.; And Others 
English-Spanish Verbatim Translation Exam. 
Center for Applied Linguistics, Washington, D.C- 
Federal Bureau of Investigation, Quantico, VA. 
7 Nov 90 
232p. 

Reports - Descriptive (141) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MFOI/PCIO Plus Postage. 

Content Validity; ^English; Language Proficiency; 
^Language Tests; ^Spanish; *Test Construction; Test 
Items; ^Translation 

^English Spanish Verbatim Translation Test; ^Federal 
Bureau of Investigation 



ABSTRACT 

The development and validation of the English-Spanish 
Verbatim Translation Exam (ESVTE) is described. The test is for use 
by the Federal Bureau of Investigation (FBI) in the selection of 
applicants for the positions of Language Specialist or Contract 
Linguist. The report is divided into eight sections. Section 1 
describes the need for the test, reviews the literature on the 
testing of translation ability, and discus3es the development of 
translation skill level descriptions. Section 2 describes the 
multiple-choice and production sections of the ESVTE, scoring 
procedures and time limits. Sections 3 and 4 describe the 
development, trialing, and pilot testing. Section 5 describes the 
design and validation study, which included members of the FBI, 
Houston Police Department, and professional translators. Section 6 
presents statistics on the scores of the subjects, and analyzes the 
reliability of each ESVTE section. Section 7 discusses content 
validity. Section 8 describes the equating of the two parallel forms, 
and the establishment of a cut score on the ESVTE multiple-choice 
section. Appended materials include sample test items, administration 
instructions, scoring guidelixies, the FBI/Center for Applied 
Linguistics Translation Sk* 11 Level Descriptions, questionnaires, and 
other data collection instruments. (Author/VWL) 
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Abstract 



This document describes the development and validation of 
the English - Spanish Verbatim Translation Exam (ESVTE) for use 
by the Federal Bureau of Investigation (FBI) in the selection oi 
applicants for the positions of Language Specialist or Contract 
Linguist. The report is divided into eight sections. Section 1 
describes the need for the test, reviews the literature on the 
testing of translation ability, and discusses the development of 
translation skill level descriptions. Section 2 describes the 
multiple-choice and production sections of the ESVTE, scoring 
procedures and time limits. Section 3 and 4 describe its 
development, trialing and pilot testing on translation students 
at Georgetown University. Section 5 describes the design of the 
validation study, which included 42 employees of the FBI, members 
of the Houston Police Department, and professional translators. 
Section 6 presents descriptive statistics on the scores of the 
above subjects, and analyses the reliability of each ESVTE 
section using traditional methods and General izeability theory. 
The results indicate that the ESVTE is quite reliable for a test 
that involves free response items. Section 7, the longest of the 
report, begins with a discussion of content validity. Subsequent 
subsections discuss the evidence for construct, criterion- 
related, convergent and discriminant validity based on the 
results of the validation study. The results indicate that the 
two ESVTE constructs. Accuracy and Expression, are highly 
interrelated, because of lack of variation in the English ability 
of the subjects. Section 8 describes the equating of the two 
parallel forms, and the establishment of a cut score on the ESVTE 
multiple-choice section, which can be used as a screening test. 
The 18 appendices include sample test items, administration 
instructions, scoring guidelines, the FBI\CAL Translation Skill 
Level Descriptions, questionnaires and other data-collection 
instruments . 
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Abstract 

English - Spanish Verbatim Translation Exam (ESVTE) . The ESVTE 
was developed by staff of the Foreign Language Education and 
Testing Division of the Center for Applied Linguistics (CAL) 
under contract with the Federal Bureau of Investigation (FBI). 
The ESVTE is designed to be a job-related test of the ability to 
render a translation in Spanish of a text written in English. 
The report is divided into five sections, plus appendices. 

Section 1 provides an introduction to the project and 
establishes a framework for the project* This section describes 
the groups that would potentially be given the test, the survey 
of the types of documents for which the FBI requires translation, 
the development of FBI\CAL skill level descriptions for 
translation, the nature of translation, and the emergence of the 
two constructs of translation ability that can be measured by the 
ESVTE. 

Section 2 provides a description of the test, which is 
divided into multiple choice and free response sections. The 
scoring of the test is also described and the computation of the 
total scores on two criteria, Accuracy and Expression, are 
discussed. 

Sections 3 and 4 describe the development, trialing, and 
pilot testing of the ESVTE on 50 students majoring in translation 
at Georgetown University and the successive revisions the ESVTE 
underwent during its development. 
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Section 5 describes the validation study that was conducted 
on the final version of the test* It discusses the test 
administration procedures, the sample, and the scoring of the 
tests. For this study, 42 examinees took both forms of the 
ESVTE. The subjects were FBI Language Specialists and Contract 
Linguists, Special Agents, and support staff, as veil as members 
of the Houston (Texas) Police Department. 

Section 6 presents descriptive statistics on test 
performance from the validation study as well as a detailed 
analysis of the reliability of the test. Reliability analyses 
include internal consistency, product moment correlations, and 
generalizeability coefficients. 

Section 7 presents the discussion of the validity. For this 
study, additional data wcs collected from employee files in the 
form of independent measures of proficiency in Spanish and 
English, and scores on an earlier generation of FBI translation 
tests. Subjects also completed a self-rating of the ability to 
translate various types of FBI documents. A number of 
statistical analyses were performed on the data. The results 
establish the validity of the ESVTE scores and support their 
validity for screening, selecting, and placing FBI applicants and 
staff in positions requiring English - Spanish translation 
ability. 

Section 8 of the report describes the development of a score 
conversion table, which can be used to convert scores on the 
ESVTE to an overall rating of translation proficiency on a 0 to 5 
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1j lntr<yaugtipn 



This section of the report on the English into Spanish 
Verbatim Translation Exam (ESVTE) is intended to provide the 
reader with some appropriate background as a preliainary to a 
discussion of the test, 
l.l. Need for the Test 

The Federal Bureau of Investigation (FBI) is the Federal 
Government's principal agency responsible for investigating 
violations of federal statutes. The overall objective of the FBI 
is to investigate criminal activity and civil Batters in which 
the Federal Government has an interest, and to provide the 
Executive Branch with information relating to national security. 
FBI activities include investigations into organized crime, 
white-collar crime, public corruption, financial crime, fraud 
against the government, bribery, copyright matters, civil rights 
violations, bank robbery, extortion, kidnapping, air piracy, 
terrorism, foreign counterintelligence, interstate criminal 
activity, fugitive and drug trafficking matters, and other 
violations of more than 260 federal statutes. 

In all ol the above areas of jurisdictional responsibility, 
it is likely that the FBI could be called upon to investigate a 
large number of cases that involve languages other than English. 

Because of this, it is understandable that the FBI is being 
increasingly called upon to provide Special Agents and support 
staff that are proficient in a foreign language. All modes of 
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communicative skills »ay be required* That is, FBI staff may 
need to be able to speak, understand, read or %nrite the foreign 
language. They may also be required to provide oral 
interpretation or written translation. Often, they are called 
upon to provide a written siunmary in English of a fc ^ign 
language conversation* 

The need to assess employees* or potential employees* 
language skills can be satisfied in a nusnber of ways* To measure 
the speaking skill, the FBI has used the Interagency Language 
Roundtable (ILR) Oral Proficiency Interview for many years. To 
measure the listening and reading skills, the FBI uses the 
Listening and Reading sections of the Defense Language 
Proficiency Test (typically version II), (Walker, et al., 1988) • 
These exams are taken by applicants for the position of Special 
Agent Linguist/ Language Specialist, and Contract Linguist. 

The FBI also has the need to measure the ability to provide 
a written English summary of a non-English conversation. 
Frequently, this conversation involves a telephone communication 
that has been authorized by a magistrate as part of an ongoing 
criminal investigation. GAL developed the Listening Summary 



^Special Agent Linguists are Special Agents who are 
qualified to investigate crimes involving foreign languages. 
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Translation Exam (LSTE) as part of its contract with the FBI.' 
The development and Validation of the LSTE is the subject of a 
separate report (Stansfield, Scott & Kenyon, 1990a), and is not 
formally treated in this reports 

The FBI also has the need to ueasure the ability to 
translate written documents. Up until now, this need has been 
satisfied, for about 20 languages, through two parallel 
translation exams. Since these exams are secure instruments, CAL 
staff know nothing about them other than the fact that the FBI 
feels a need to develop new translation exams. Because of this, 
the FBI issued a request for proposals (RFP) to develop 
completely new tests of translation skill (Spanish into English 
and English into Spanish), which is the subject of this report 
and a companion report (Stansfield, Scott & Kenyon, 1990b). 
1.2. Intended Use 

The ESVTE is designed for use in the hiring of Language 
Specialists and Contract Linguists. Language Specialists are 
full-time regular employees of the FBI, while Contract Linguists 
are self-employed and work on an hourly basis. The translating 
work of Language Specialists and Contract Linguists is primarily 
audio-to-document or document-to-document. The subject matter 
may be in any area in which the FBI has jurisdiction. As 

'The LSTE presents taped Spanish language conversations as 
stimuli and requires the examinee to answer multiple-choice 
questions or to provide a written summary as a response. The 
LSTE provides scores on the accuracy (including adequacy) of the 
information in the summary and on the quality of the English 
expression contained in the summary. 
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indicated on an FBI job announcement, an FBI Language Specialist 
is a full-time employee whose duties are to ^translate both 
recorded and written material into English and vice versa, which 
involve a wide range of difficult subject matter containing 
technical or specialized terminology such as used in fields of 
law, politics, science, economics, and international exchange, as 
well as nontechnical subject matter. 

The ESVTE would be taken by civilians who are applying for 
these two categories of position and by current FBI employees, 
such as support staff, who are seeking a promotion to the 
position of Language Specialist. 

According to tba statement of work in the RFP, CAL is to 
provide a test that can measure translation ability at levels 2+ 
through 5. Such levels would be appropriate for Language 
Specialists and Contract Linguists. ESVTE scores will provide 
supervisors with an indication of the testees suitability for a 
given work assignment involving English to Spanish translation. 
1.3. FBI Translation Needs Survey 

One of the first tasks undertaken during this project was 
the development of a (juestionnaire for the purpose of conducting 
a survey of the type of translation work required of Language 
Specialists in FBI field offices. It was hoped that this survey 
of the FBI's translation needs would be of help in determining an 
appropriate balance of topics and tasks for the tests to be 
developed. This questionnaire was developed by CAL staff during 
August 1988 and wab subsequently revised by the FBI. Following 
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these revisions, FBI Headquarters mailed two copies of the 
questionnaire to Language Specialists voxking in FBI field 
offices across the country. A total of 28 Language Specialists 
responded to the questionnaire* The questionnaire concerned 
translating from Spanish to English and from English to Spanish, 
The last page of the questionnaire vas devoted to translating 
from English to Spanish. A copy of the questionnaire and the 
results are included in Appendix Q. The questionnaire required 
the Language Specialists to indicate the proportion of time they 
spent translating each type of document listed in the 
qTiestionnaire. Unf oirtunately , the results of the questionnaire 
are limited, since, many individual's responses totaled more than 
100%. Still, the results of the questionnaire did provide 
supporting information for the development of the LSTE, the 
ESVTE, and the SEVTE. In general, the results indicated that 
Language Specialists spend more time on listening tasks than 
translating written texts, particularly monitoring and 
translating telephone and recorded conversations. They are also 
called upon to provide oral interpretations. 

More than half of the Language Specialists responding 
indicated they are often called upon to translate or summarize 
written material. The material these respondents most often 
encountered dealt with organized crime, narcotics, terrorism, and 
counter intel 1 igence . 

The results of this survey were used to select topics for 
the written and recorded stimuli that appear on the three tests 
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developed for this project. 

1.4. FBX\CAL Translation Skill Leval Daacriptions 
History 

Over the years there have been a number of attempts by 
government agencies to develop skill level descriptions (SLD) for 
translation. None of these have been accepted outside of the 
agency in which they were developed. The FBI also developed a 
set of Translation SLDs a number of years ago. However, the 
Bureau was not satisfied with them. As a result, the Statement 
of Work in the FBI's Request for Proposals called for the 
development of new translation skill level descriptions (see 
Appendix R.) The statement of work also called for scores on the 
test to be convertible to the 0-5 ILR scale. As a result, CAL 
proposed the development of such skill level descriptions as part 
of this project. Once the project was funded, the first 
deliverable to be developed was the Translation SLDs. These were 
needed to inform the test development process, and, in 
particular, to inform the scoring of the test and the conversion 
of the scores to the 0-5 scale. Thus, soon after notification of 
fundi. .g was received, CAL staff went to work on the skill level 
descriptions. 

In July 1988, CAL staff met with the project monitor and 
five FBI staff at FBI headquarters. Attending were FBI master 
translators^ At this meeting it was agreed that, in order to 

^Language specialists at FBI Headquarters in Washington DC 
are referred to as Master Translators. 



help CAL begin the development of ILR skill level descriptions 
for translation, by the end of the wonth the FBI staff present 
would write a personal definition of what constitutes an 
excellent translator, a good translator, a Bediocre translator, a 
poor translator, and a bad translator. It was agreed that CAL 
would use the descriptions of these five groups of translators as 
a point of departure for preparing skill level descriptions for 
translation. Because FBI staff were familiar with the ILR SLDs, 
their descriptions showed a similarity in form to these 
descriptions. The following description of a "mediocre" 
translator illustrates the kind of descriptions that were 
received. 

"Able to provide an understandable and fairly accurate 
translation of a larger number of texts, but still makes a number 
of mistranslations. Problems with spelling, grammar, and 
punctuation. Becomes lost when structure becomes complex or 
language more sophisticated and has serious problems with slang, 
idioms and handwritten materials." 

The descriptions of different groups of trar .lators provided 
by FBI staff, although brief and informal, were used as a 
starting point for writing skill level descriptions. 

CAL staff began by writing descriptions for level 5 
translation, and then worked down the scale to level 0+. The 
first set of skill level descriptions was drafted by Ana Maria 
Velasco, an experienced translator familiar with the ILR scale. 
She drafted the descriptions based on her experience evaluating 
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the work of many different translators. In consultation with the 
project director. Dr. Velasco selected seven variables that 
should enter into the judgement or rating of a translation. 
These were accuracy, grammar (morphology), syntax (word order), 
style, tone, spelling, and punctuation. She placed these 
variables on the vertical axis of a scoring grid (matrix) . The 
horizontal axis contained 10 points on the ILR scale ranging from 
0+ to 5. In each cell of the grid, she included a statement of 
the nature of translations at that level. Both skill level 
descriptions and a scoring grid vere developed, since it was 
thought that a scoring grid that separated each translation 
variable by level and allowed comparisons by variable across 
levels would be helpful to raters* It was also recognized that 
the grid would be useful in the revision of the skill level 
descriptions for the same reasons. Txiat is, the description of 
ability on each relevant variable in the scoring grid could be 
consulted in the writing of the skill level descriptions. The 
final reason for producing the scoring grid was because we were 
unaware at the time which document, the grid or the skill level 
descriptions, could be used to score the test more reliably. 

The project director then reviewed the skij.1 level 
descriptions 'ind the scoring grid, making revisions where 
appropriate. His revisions were based on careful analysis of the 
wording of all the current ILR skill level descriptions, 
particularly the reading level descriptions. The revised SLDs 
and the scoring grid were then subject to careful review by 

19 



ERiC 



21 



Marijke Walker and her staff at the FBI. They responded to the 
draft descriptions based on their experience c /aluating the 
translations of Language Specialists and applicants for 
employment as a Language Specialist. After receiving a set of 
comments from Ms. Walker, CAL revised both documents. A uajor 
revision to occur at this point, at the suggestion of Ms. Walker, 
was the inclusion of syntax within grammar on the scoring grid 
and the addition of vocabulary to the grid. (A copy of the grid 
is included in Appendix I as Exhibit A.) Another substantive 
revision was a change in the percentage correct criteria for 
punctuation and spelling at level 5. It was decided that for 
purposes of the grid, the translation need not be perfect in 
absolutely spelling in order to be at level 5. A brief 
description of the kinds of documents that can typically be 
handled by a translator at each level was included. 

On December 5, 1988, a meeting was held at FBI Headquarters 
to review the revised set of Translation SLDs. Present at the 
meeting were Charles W. Stansfield and Ana Maria Velasco from 
CAL, Marijke Walker and her staff, Thomas Parry from the Central 
Intelligence Agency, and James Child from the Department of 
Defense. During this meeting it was noted that the draft 
Translation SLDs describe the characteristics of the translated 
document, while ILR SLDs for other modes of communication 
describe the skills of the person being evaluated. It was 
suggested that the Translation SLDs should consistently describe 
the translator, rather than the translated document. It was also 



agreed to introduce this current draft of the descriptions to the 
ILR Testing Committee before making any revisions, and to ask 
committee members for written comments regarding how the draft 
can be improved. 

These Translation SLDs were the cubject of a brief 
discussion at the December meeting of the ILR Testing Committee 
two days later. Members of t:he committee were given a 
questionnaire concerning the SLDs to complete and mail to CAL 
(see Appendix I, Exhibit B) . Unfortunately, no questionnaires 
were returned. The committee met again in February, 1989, with 
essentially the same outcome. While general and conceptual 
concerns were expressed at the meeting about the SLDs, only three 
specific suggestions for improvement were made. These 
suggestions were a.) to change the descriptions so that they 
referred to the translator rather than to the translation, as 
suggested earlier, b.) to use the term "to render" when referring 
to the act of translating, and c.) to reorder the descriptions so 
that they begin with level 0 and progress to level 5. 

Following this meeting, Charles Stansfield and Marijke 
Walker worked jointly on several occasions to improve the SLDs. 
The ILR Testing Committee met again on March 8, 1989, to consider 
the n<^yt revision. At this meeting it was not possible to obtain 
organized and coherent feedback or approval on the descriptions* 
Thus, CAL and the FBI agreed subsequently that the level 
descriptions being developed for this project would be used by 
the FBI, and that they would be available to the ILR for use as 
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interim SLDs until such tine as the ILR Testing ComiDittee has 
tizse to consider and revise thes further* Sul^se^^uentXy^' 
Stansfield and Walker met again to sake additional revisions on 
the SLDs. These revisions included the incorporation of some of 
the wording used in the previous set of Translation HLDg used by 
the FBI. The task of developing and revising the Translation 
SLDs was completed in June, 1989. No further work was done on 
them for seven months. 

The Verbatim Translation Exams that CAL developed for the 
FBI were administered during the months of November and December 
1989. After scoring the Listening Summary Translation Exam ^ CAL 
staff and consultants then scored the production portions of the 
verbatim translation exams. Soon it became apparent that there 
were limitations in the ability of the SLDs to describe all 
examinees. The problem seemed to lie in the fact that some 
examinees were translating into their native language and some 
into a second language. In the case of a number of examinees, 
there was a considerable discrepancy in the proficiency in the 
two languages. Examinees who were translating into their native 
language, especially English, produced translations that were 
very fluent and grammatical ^ but inaccurate in terms of content. 
Similarly, when translating into the second language, some 
examinees produced accurate translations that evidenced problems 
with grammar or vocabulary. As a result, on Januaiy 30, 1990, 
Stansfield and Scott sent a memo to Marijke Walker at the FBI in 
which they recommended that the current SLDs be divided into two 



parts: one for Accuracy and one for Expression^ and that 
separate scores be assigned for each. CAL also reconunended that 
the discussion of tlie kinds of documents a translator at a given 
proficiency level can handle be deleted from the SLDs, since the 
verbatim exams did not provide the opportunity to examinees to 
translate all of the types of documents mentioned. The FBI 
agreed to this change. It is most significant that the results 
of the validation study supported this division of translation 
abilities. 

The current version of the SLDs is basically the same as the 
one that was used to score the Verbatim Translation Exams* 
However, after the scoring of the test was completed, we realized 
that the discussion of the kinds of documents a translator at a 
given proficiency level can successfully render is useful 
interpretive information for test score users.* Therefore, the 
version of the SLDs included in this report presents this 
discussion following the SLDs for Accuracy and Expression. It 
should be remembered however, that the raters of the ESVTE did 
not use this interpretive information when scoring the responses 
of examinees who participated in the validation study. 
1.4.2. Explanation of Skill Level Descriptions 

The FBI\CAL Translation SLDs are divided into three parts. 
The first part is the Accuracy description. Accuracy is the 



*It should be pointed out that there is no empirical data, 
in the form of a criterion related of predictive validity study, 
to support this interpretive information. 
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ability to correctly convey the information in the source 
document ♦ The second part of the description is the Expression 
description* This describes the examinee *8 command of the 
written form of the target language* The third part of the 
translation skill level descriptions is the interpretive 
information. This is a sentence describing the general ability 
level of the examinee and the types of documents that he or she 
can be expected to translate successfully* 

Because an examinee may be called on to translate into his 
or her native language or second language, it was necessary to 
separate the ratings for Accuracy and Expression. By evaluating 
Accuracy and Expression separately, the level descriptions can be 
used to characterize an examinee whose translatioi> is accurate 
but may evidence some problems with grammar or vocabulary. 
Otherwise, two different examinees might receive the same score 
by a rater who is attempting to compensate for either lack of 
accuracy in the information conveyed or lack of grammaticality in 
the translation. A personnel administrator trying to make a 
decision on hiring would not have sufficient information from a 
score combining Accuracy and Expression to make an informed 
decision. This is because a typical profile of a level 2 
(Accuracy) translator when translating into his or her native 
language, may be a level 4 in Expression but only a level 2 in 
Accuracy. Such an individual could not handle the kind of 
documents mentioned in the ILR reading descriptions for Level 3 
or those mentioned in the interpretive information for level 3 of 
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the Translation SLDs. On the other hand, with separate scores 
available for Accuracy and Expression, an administrator would be 
able to make a decision to hire an examinee whose translations 
would be accurate though unpolished. 

The three parts of the Translation SLDs, unlike the SLDs for 
listening, speaking, reading and writing, must be in separate 
sections. This is because translation involves two languages, 
and the examinee's ability in each language may not be equal. 

The first part of the SLDs is the Accuracy description. The 
Accuracy description focuses on whether the information contained 
in the source document is distorted or lost in the translation, 
or whether information has been inserted in the translation that 
was not in the source document. In the field of translation, 
such problems are referred to as mistranslation, omission, or 
addition. Scoring a translation for Accuracy requires comparing 
it with the original. The Accuracy descriptions presented here 
refer to accuracy in translating a wide variety of documents. 
The Accuracy descriptions refer to the ability to sustain 
perfoirmance (to render the document into the target language 
successfully) over a wide variety of documents varying in type 
and difficulty, rather than a single document. In general. 
Accuracy is the principal ability being measured in a test of 
translation. Thus, the Accuracy rating is the principal rating 
of the examinee's ability to tr^^nslate. 

Again, it must be remembered that this rating is descriptive 
of the ability to translate a wide variety of document^ . A level 

25 

ERiC ^7 



3 translator aay translate a level 1 document perfectly, thus 
making it appear to be a level 5 translation. Sinilarly,. the 
sane translator given a level 5 document may produce a 
translation that appears to be less than level 3* 

Because the accuracy of a translation may vary according to 
the diffrculty of the document being translated, the developer of 
translation skill levels faces a dilemma* It is necessary to 
choose a type of document or level of document (in terms of 
difficulty and complexity) on which to base the Accuracy 
descriptions. In this case, we chose to describe Accuracy in 
rendering a hypothetical "average" or typical docximent. An 
average document, in terms of difficulty, would be one at level 3 
or mostly at levsl 3, which would make it a 2+. A level 3 
translator would be able to translate an average document. As 
the translator moves above level three in ability, he or she, by 
definition, can handle documents of above average difficulty. 
That is, he or she can handle documents at level 3+, 4. or even 
higher. The Accuracy description nicely represents both the 
translation ability level of the examinee and the level ot task 
or document that the examinee can handle adequately. 

The second part of the skill level descriptions is the 
Expression description. Expression invox>,'3s all the linguistic 
variables apparent in a translated document except Accuracy. 
These variables are grajonar, syntax, vocabulary, style, tone, 
spelling, and punctuation. In general, it is possible to score a 
translation for roost of these variables without referring to the 
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source docuioent. However, it will sometimes be necessary, 



source document with the translated document, particularly if the 
style and tone of the translated document are to be evaluated. 

The discussion of the type of documentc a person can handle 
that initiates each SLD for the other skills is not truly part of 
the translation scale. It is merely score interpretation 
information that is of interest to score users.* 



*If the infoinnation on the type of documents a translator 
can render were to be incorporated into the translation SLDs, 
then a rater would have to administer the documents mentioned to 
an examinee in order to verify that the statement is correct. 
This would require some type of tailored face-to-face testing. 
That is, the test administrator would have to select and 
administer a document to the examinee. Then, the test 
administrator would have to wait for the examinee to render a 
written translation of the document. Once the rater received the 
document, it would have to be scored immediately. Then, the test 
administrator would have to select another document, associated 
with a higher or lower level on the scale, and administer it to 
the examinee, and continue the process again until the rater was 
satisfied that he or she had identified the highest level of 
document that the examinee is able to translate faithfully. To 
do this, would require a full day to test each examinee, which is 
impractical for reasons of cost. Thus, the interpretive 
information in the translation SLDs is not of interest to raters 
of translated documents. 

Another theoretical possibility involving tailored testing 
would b'^ to let a computer select, administer, and score the 
translation using the skill level descriptions as a basis for 
scoring. While a computer could select a document of 
predetermined difficulty, and administer it to the examinee, and 
the examinee could key-enter a translation of the document on the 
computer screen, it is not yet feasible for a computer to score a 
translation using even an analytic scale, and it is doubtful that 
a computer will be able to use a holistic scale (such as the 
SLDs) for many years to come. Thus, it is not possible to 
develop a tailored test of translation ability at this time. 
Other ILR SLDs, such as those for speaking and reading, assume 
that tailored face-to-face testing is possible. Thus, the 
inclusion in the other ILR SLDs of the type of documents or tasks 
that can be rendered is more logical. It is not logical to 



When using the interpretive inforaation, a score user should 
remember that it refers to the type of dociments that an examinee 
can render successfully. Efforts to translate »ora sophisticated 
documents than those associated with that level or lover levels, 
wil? result in less than adequate translations* 
The Mature of Translation Ability 
The Mead to Define the Construct 

Bachman (1990, p. 251) , citing Upshur, distinguishes between 
viewing a test score as a pragmatic ascription (the individual is 
able to perform a task) , versus viewing a test score as a measure 
of some human construct (the individual has a certain ability) . 
He notes that there is often confusion between the measurement of 
the activity and the measurement of the construct and the 
processes that underlie it. Indeed, he notes that the activity 
is often confused with the construct and vice versa. 

Bachman *s characterization of this confusion regarding 
validity is somewhat analogous to the dilemma we encountered when 
we wrote our proposal to do this project in September 1987. In 
this case, we started with products (translations) , and in the 
process of developing the test, we identified the constructs 
involved in the measurement of translation ability. We learned 
that translation ability is most appropriately expressed through 
two ir.ain constructs, accuracy and expression. 

It is important to distinguish between translation ability 



include them as an integral part of the Translation SLDs* 
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as a nteasurenent construct and translaticn ability as a 
psychological construct. A measurement construct is one that 
holds up under statistical analysis, such as factor analysis or 
other appropriate procedures, it should be supported by 
descriptions of the psychological construct, which refers to the 
mental operations and processes involved. Neither the 
measurement construct nor the psychological construct was 
understood at the start of this study. Thus, we entered the 
study fully aware that we were sailing uncharted waters. While 
hopeful that we would make some discoveries, we were fully aware 
that any test we constructed might not stand up to scientific 
analysis. Thus, we were aware that we might fail in uur effort 
to construct a reliable and valid test of translation ability. 

In terms of a psychological construct, we identify 
translation ability as a ne)cus of psychological and linguistic 
knowledge, skills and abilities that can be combined with real 
world knowledge to produce a translated document. This is an 
initial definition of translation as a process; it is in no sense 
a description of the process. At present, there is almost no 
understanding of the translation process. Moreover, the level of 
ignorance about translation is exacerbated by the fact that many 
translators have written about it and their writings create the 
impression that a literature on the process exists and, 
therefore, that the process is at least partly understood. 
1.5.2. The Literature on Translation 

The writing of translators about translation has focused on 
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the best approach to translation.* Two »ain approaches have 

free translation. Those who espouse a literal translation strive 
to be faithful to the language of the source docvment, while 
those who espouse a free translation strive to produce a similar 
rhetorical effect as does the source document. Thus, it can be 
seen that academic discussions of translation center on the 
subject of equivalence. That is, how one produces a target 
document that is equivalent to the source docximent.^ 

A discussion of this nature is far from scientific 
discussion. Indeed, almost everyone who %nrites about translation 
appears to be unaware that translation is an 20Dility that can be 
the subject of scientific inquiry. Moreover, when the 
possibility of developing a scientific knowledge base about 
translation is raised, it is quickly dismissed. In regards to 
this possibility, Nevmark, who is probably the best known of 
those who write about translation, has stated: "There is no such 
thing as a science of translation, and there never will be" 
(1981, p. 113) • 



*Because the literature on translation was largely unhelpful 
and did not inform this test:, we have not attempted to include a 
formal review of the literature here. Instead, we will give only 
a brief summary of the literature. 

'Recently, there has been some attention to the role of text 
characteristics in determining the approacn to use. For a 
summary of the rhetoric on equivalence and on the role of text 
characteristics, see Pochhacker (1989) . 
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Apart from the questions of approach and ecpaivalence, there 
is also some literature on the nature of a good translation, 
which might appear to be relevant to the measurement of 
translation ability. In a portion of this literature, 
translators usually describe some problems they encountered in 
translating specific documents. Another portion of this 
literature discusses the characteristics of a good translator or 
translation. The characteristics are usually stated in the form 
of ascriptions, i.e., is sensitive to the nuances of words in 
both languages, is sensitive to style, tone and purpose. Such 
ascriptions do not help us to understand translation as a 
psycholinguistic process or point us to the appropriate 
constructs to measure. 

Some authors have noted that there are certain prerequisites 
to being a translator. Apart from the attitudinal 
characteristics, such as a love of language, most notable among 
these are a knowledge of the language of the source document, a 
knowledge of the language on the target document, and some 
knowledge of the subject.* Again, this information, while 
accurate, was not helpful to us in developing a test of 



^Knowledge of the subject is viewed as being less important, 
since it is considered that one can learn this quite easily by 
reading on the subject prior to beginning the translation. It is 
interesting to note that we did not encounter a single mention of 
••schema theory^^ in writings on translation. 
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translation ability.* 

In this study, we identified accuracy and expression as the 
measurement constructs of relevance. He define accuracy as the 
ability to render the information or propositions in the source 
document into the target document without mistranslations, 
additions, or deletions. We define expression as the ability to 
express oneself appropriately in the target language in the 
context of a translation. 

We could not identify these constructs at the start of the 
project. Instead, they emerged slowly as ti*e project progressed. 
As indicated in section 1.4., the first task of this project was 
the development of skill level descriptions (SLDs) . These SLDs 
combined statements referring to accuracy, to categories of 
expression, and to the type of documents a translator can handle. 
The SLDs were written so that they could be used in some way when 
scoring the test or referenced when inteirpreting the test score. 
Once the descriptions were drafted, we began developing the 
tests . 

The process of scoring ^rial tests and pilot tests provided 
us with more experience in the measurement of translation. For 
instance, pilot testing indicated that people performed much 



*At the start of the study, we did a computer assisted 
search of the ERIC database, using ••translation" and "language 
testing" as major descriptors. The seven titles this search 
produced dealt with translation as a method for testing language 
proficiency or achievement. Not a single one dealt witi the 
measurement of translation ability per se. 
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better when translating into their native language* Thus, ve 
learned that a single set of skill level descriptions could not 
be used to characterize translation ability in both directions* 
For the sake of parsimony, we had initially hoped t:hat it would 
be possible to characterize a translator through a single 
proficiency rating that would indicate his or her ability to 
translate in both directions; that is, from native language to 
target language and from target language to native language. 
While this may seem naive in retrospect, at the time ve were 
influenced by the elimination of the distinction between native 
languages and second languages in linguistics (see Kachru, 1985), 
since proficiency in either can range from almost none to 
distinguished. Thus, we were not willing to accept the 
recommendation that separate sets of SLDs be developed for 
translating in each direction. Since Wv» believed a single set of 
SLDs would be adequate, we also believed that a single rating 
could characterize translavion ability in both directions, and 
that separate ratings for each direction were not necessary* The 
experience of scoring pilot tests which were given in both 
directions made us doubt this assumption and in the ensuing 
months we abandoned the idea entirely. Still, we believed, and 
continue to believe, that the same set of SLDs can be used for 
both directions, and that the development of « separate set of 
SLDs for translating to the native language and another for 
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translating to the second language is unwise.'^ Thus, we began 
the project believing that a single holistic score could 
represent translation ability, and by the end of the pilot 
testing ve had modified our ideas so that ve now believed that 
two scores, one for translating in each direction, would be 
necessary. 

At this point another i^xperience began to influence our 
ideas. During the fall of 1989, ve administered, scored, and 
analyzed the Listening Summary Translation Exam , This test, 
which is the subject of another report (Stansfield et al., 
1990a) , produced two scores, one for Accuracy and one for 
Expression. A separate score for Expression had always been 
considered for this test, since we ^ere aware that deficiencies 
in English writing ability have posed a problem for the FBI when 
translations of oral conversations are introduced in court. That 
is, even if a translation is accurate, if it is written poorly, 
the credibility of the information :t contains becomes tainted. 

The analysis of the LSTE showed the validity of the Accuracy 
rating in terms of its correlation ^ith other measures of 
proficiency in the language of the auditory stimuli. The 
analysis also showed Expression to be an entity different from 
and often unrelated to Accuracy. *s a result, we concluded that 
Accuracy is the principal trait to be measured in a test of 
listening summary writing ability, but that it may also be useful 



^°A number of government translators had advised us to do 

this. 
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to have an Expression score in order to identify examinees whose 
work nay need to be reviewed before being used in a legal 
proceeding* 

As indicated in section 1.4*1*, soon after scoring the LSTE, 
we began scoring the Spanish - English Verbatim Translation Exam 
(SEVTE) , a parallel test in the opposite direction* We soon 
realized that it would not be possible to use the SLDs to score 
the paragraph translation portion of these tests since the 
performance on the criteria relating to Accuracy was often 
incongruous with the performance on the criteria relating to 
Expression. At that point, it became apparent that the solution 
to this problem lay in considering Accuracy and Expression as 
separate constructs and assigning separate scores to each* We 
applied this same approach to the scoring of the ESVTE. This 
decision to divide translation ability into two constructs is 
supported by the many analyses reported in the section on 
validity of the SEVTE report (see Stansfield et al, 1990b)-" 
Thus, while we began this project believing that translation 
ability in both directions could possibly be represented in a 
single rating, we ended the project having learned that four 
scores are necessary to represent translation ability, i.e., two 
for each direction. These scores do not descrih the 
psychological construct or ability, but they do identify and 



^^Due to lack of variation in English language proficiency 
among the sample, the division of translation ability into two 
constructs was not validated for this sample on the ESVTE. For 
further information, see section 7.2 of this report* 
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define the measurement constructs. 

It should be noted that the ESVTE validation data did not 
verify the separation of the construct of translation ability 
into dimensions of Accuracy and Expression. However « this 
appeared to be due to the characteristics of the sample, which 
had uniformly high English proficiency. Thus, in the ESVTE study 
we also learned that proficiency in the language of the source 
document shows a threshold effect. Once a certain level of 
proficiency in the knowledge of the source document language is 
attained, variations in proficiency above the threshold level are 
not significantly related to translation ability. 

In order to gain an understanding of the psychological 
construct, psychologists and applied linguists will have to turn 
their attention to the process of trar.slation. A description of 
these processes is essential to understanding the construct of 
translation ability. 

Due to the lack of relevant research on translation, this 
project was begun without an understanding of the construct to be 
measured. We ended the project without an understanding of the 
process of translation, but with the belief that we had at least 
subdivided the construct in a practical way so that instruments 
can be developed to measure it. We believe the instrument 
described in the remaining sections of this report is a good one. 
However, in the coming decades other researchers will develop 
other instruments that may have greater reliability due to 
improved scoring procedures, or greater validity, due to a better 
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understanding of the psycholinguistic processes involved in 

wj.cajAoa.ciwa.vr««« v ^ii^XVSOO / XW XO XX#W«SXJ WAACI U IIX^U «^Uaxxi^jf 

instruments measuring translation ability vill continue to focus 
on the constructs of accuracy and expression which have emerged 
from this project. Thus, at this point, for the purpose of 
measurement, we believe it is possible to define the construct of 
translation as the ability to accurately render content 
information from a source language text to a target language text 
and the ability to express this information using appropriate 
target language grammar, syntax, vocabulary, mechanics, style, 
and tone. 
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2« a«D«ral Description 

The English - Spanish Verbatim Translat ion Exam (ESVTE) is 
designed to assess the ability to render a verbatia translation 
into Spanish of source material vritten in English* 

The ESVTE consists of two siibtcsts* The first, r<ef erred to . 
in this part of the report as the Multiple Choice section, 
consists of embedded phrase translation and error detection 
items. The second subtest, referred to as the Production 
section, requires translation of embedded phrases, sentences, and 
paragraphs* A separate test booklet, containing instructions, 
examples, and test items, is provided for each subtest* There 
are two forms of the ESVTE; they are generally parallel in 
content, item difficulty, format, and length* 
2*1 Kultiple Choice Section 

This section of the report describes the format, and test 
taking and scoring procedures for the Multiple Choice section of 
the ESVTE. 
2*1*1* Format 

There are 60 items in the Multiple Choice section: 35 are 
Words and Phrases in Context (WPC) itemsi, and 25 are Error 
Detection (ED) items. In a WPC item, an examinee is required to 
select the best translation of an underlined word or phrase 
within a sentence. In an ED item, an examinee must identify 
where an error is located within the sentence, or indicate that 
there is no error* ED items are written in the target language 
only; errors may consist of incorrect grammar, word order, 
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vocabulary, punctuation, or spelling. (There is no juore than one 
error per item.) 

The multiple choice items are designed to test specific 
grammar points such as subject-verb agreeajent, verb tense 
(preterit vs. imperfect, subjunctive, etc*) # pronouns, 
prepositions, gender, or word order; or vocabulary, including 
noun, verb, adverbial, and adjectival phrases, ar J false 
cognates. The results of a content analysis" of the ESVTE 
Multiple Choice sections are displayed in Appendix D» Briefly, 
43-47% of the items assess knowledge of grammar, 52-53% assess 
knowledge of vocabulary, 5% assess knowledge of mechanics 
(spelling or punctuation), while 8% of the items contain no 
error. 

The test booklet contains instructions, example items for 
each subsection (WPS and ED) , explanations of the example items, 
and the test items. Appendix B contains selected portions of a 
test booklet for the Multiple Choice section, including the cover 
page, instructions, and example items. This appendix can be used 
by the FBI to construct an examinee handbook. 
2*1.2. Test Taking 

Each examinee receives a Multiple Choice section test 
booklet, a machine-scoreable answer sheet, and two No. 2 pencils- 



^'^The content analysis of test was carried out by CAL staff 
and then verified by FBI Headqiiarters staff. 

*^Some of the items test knowledge of more than one aspect 
of language. 
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Examinees listen as the test supervisor reads instructions for 

booklet cover page. Subsequently, they are given 35 Binutes to 
complete the Multiple Choice section. 
Bcoring Procedures 

Examinees record their responses to the Multiple Choice 
section of the ESVTE on answer sheets which are scored by 
- machine. The score on this section is the number of answers 
correct. The maximum possible score is 60. 
2.2. Production Section 

This section of the report describes the format of the 
Production section as well as test taking and scoring procedures. 
2. 2.1. Format 

There are 28 production items on each exam form; 15 items, 
called Word or Phrase Translation (WPT) , require translation of 
underlined words or phrases in sentences, 10 items, called 
Sentence Translation (ST) , require translation of complete 
sentences, and three items, called Paragraph Translation (PT) , 
require translation of entire paragraphs." 

The test booklet contains instructions, an example of each 
item type (except for the paragraphs) , a brief discussion of each 
example item, and the test items. Space is provided in the 
booklet for the examinee to write the translation below each 



"The paragraphs on the ESVTE forms range from 66 to 91 
words in length, averaging 84 words per paragraph. The sentences 
in the Sentence Translation subsection range from 8 to 17 words 
in length. 
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item. Appendix C contains selected portions of a test booklet 
for the Production sect ion Including the cover pACfe. 
instructions, and example items. (The reader »ay find it helpful 
to refer to these now in ordex" to get a better understanding of 
the nature of the ESVTE.) 
2.2 •2. Test Taking 

Examinees are given 35 minutes to complete the first two 
subsections (WPT and ST) and 48 minutes to complete the paragraph 
subsection. They are permitted to use dictionaries only in 
translating the paragraphs. 
2.2 .3. Bcoring 

As noted above, examinees write their translations in the 
test booklet. Each subsection is scored by a trained rater 
according to the procedures outlined below. 

2.2.3.1. Words or Phrases in Sentences Items 

The keys for this subsection are quite comprehensive, 
containing a number of acceptable translations for each item. 
However, when scoring the test a rater is free chose to accept 
other appropriate translations that are not included in the key 
if he or she believes that translation is correct. The items are 
scored as either correct or incorrect, regardless of whether an 
error consists of incorrect grammar, word choice, or syntax. One 
point is awarded for each correct translation; hence, the maximum 
score for this subsection is 15 points. 

2.2.3.2. Sentence Translation Items 

The keys for this subsection contain several acceptable 



translations for each item, although the keys do not purport to 
list all possible acceptable translations. A trained rater 
assesses the Accuracy of the translations, i.e., the extent to 
which the original meaning has been appropriately conveyed. From 
0 to 5 points are awarded for the translation of each sentence, 
according to the scoring guidelines found in Appendix E. As 
there are 10 sentences, a saaxinuin of 50 points are possible for 
this subsection. 

2.2 .3. 3. Paragraph Translation IteJts 

The keys for this subsection provide only one translation 
for each paragraph, even though a number of slightly different 
but acceptable versions are possible. The example translation is 
intended to provide a standard interpretation of the source text, 
and raters may use their expertise in the language to judge 
whether variations in examinee renditions remain faithful to the 
original meaning. On the other hand; the rater training 
materials provide several examples of translations at different 
ability levels, along with appropriate scores for each 
translation. 

Examinee translations are evaluated for correctness of 
Grammar (morphology), Expression^^ (in the case of the paragraph 
translation items only. Expression refers to word order and 
vocabulary) , Mechanics (spelling and punctuation) , and Accuracy 

^^The reader is advised not to confuse paragraph expression 
with the overall Expression score. The overall Expression score 
includes all criteria referred to in the SLDs other than 
Accuracy. 
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(as described above) • From 0-5 points are awarded in each 

Since there are three Paragraph Translation items, a total of 60 
points are possible for this subsection; 15 points for Accuracy 
and 4 5 for Expression* 
2*3« Conputatioc: of Total Scores 

A total score is computed separately for Accuracy and 
Expression. (See the discussion of these constructs in section 
1.5.3) A maximum score of 185 points (80 for Accuracy and 105 
for Expression) is possible for the entire exam* The total for 
Accuracy and Expression is then converted to a Translation 
proficiency rating (one of the new CAL/FBI Skill Level 
Descriptions) using the conversion tables (one for each exam 
form) found in Appendix 0. The development of these conversion 
tables is described in section 8.3 of this report. 

The total score for Expression is composed of the 60 items 
in the Multiple Choice section, whici* are worth up to 60 points, 
plus the sum of the points earned for Grammar, Expression, and 
Mechanics (up to 45 possible) on the Paragraph Translation 
subsection of the Production section. Thus, the examinee may 
obtain a raw score of up to 105 points for Expression. 

The total score for Accuracy is composed of the 80 points 
that may be earned on the Production section. The examinee may 
earn 15 points for Accuracy in the Word and Phrase Translation 
items, 50 points for Accuracy in the Sentence Translation items 
(up to 5 points for each of 10 sentences) and 15 points for 
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Accuracy on the three paragraphs (up to five points per 
paragraph)." 

2. 4. Vmm of Multiple choice Saetioa for ScrMnin^ 

The Multiple Choice section aay be used to screen out 
individuals for whom the Production section of the exam would be 
inappropriate. Since the minimum recommended passing score is 
2.8 or a 2+ on the Translation Skill Level Descriptions, 
• examinees who have some reasonable chance at scoring at ^is 
level should not be screened out. Prior FBI policy has 
established a 2.0 as a screen (previously based on a DLPT reading 
score), and CAL was requested to continue this practice by using 
the Multiple Choice section score corresponding to a 2.0 on :he 
entire ESVTE as a screen. Through statistical analyses 
(described in section 8.4), we have determined that the raw score 
cut-off on the Multiple Choice section should be 23 for Forms 1 
2. Examinees scoring at or below these scores need not take the 
Production section of the ESVTE, since they are unlikely to have 
a translation skill level at 2.8 or above when the entire e am is 
administered. If they have already taken the Production section, 
it need not be scored. 



^*As explained later in this report, a multiple regression 
analysis did not improve on this raw score weighting. Thus, it 
was decided to use this weighting to calculate the total score 
for Accuracy. The effect of this weighting is that the Sentence 
Translation subsection counts more than three times as much as 
the Paragraphs subsection due to the number of raw score points 
that are earned on each. 

44 



ERIC 



3* D«v«lopa#nt of thm ISVTB 

This section describes the development of the two pilot 
forms of the ESVTE» The preparation of exaaination materials and 
the development of pilot study scoring methods are also 
discussed* 
3#1« Exam Forms 

Items for the ESVTE were developed by CAL y^taff and 
consultants, taking into account the results of the survey of FBI 
translation needs (see section 1*3), the results of which are 
reported in Appendix Q of this report* They relied on their 
expertise as translators and teachers in developing the items. 
The item developers sought to test aspects of English that are 
especially challenging to translate because there is no direct 
equivalent in Spanish. The developers also focused on aspects of 
grammar that have traditionally caused problems for 
English/Spanish translators and students because there is no 
direct correspondence between the two languages. These areas 
include pronouns, verb tenses and sequence of verb ten':;es, use of 
negatives, possessives, prepositions, and non-temporal verb forms 
(infinitives, gerunds, past participles), among others. 

A number of item texts were either excerpted directly from 
documents provided by the FBI or were paraphrases of such 
documents. In addition, many items were paraphrased from 
newspaper and magazine articles and documents encountered in the 
professional work of the item developers. The developers 
selected the material carefully, so that the topics and 
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vocabulary of the item texts would consistent with the type of 
documents FBI employees reported being required to translate on 
the survey of FBI translation needs. 

Parallel forms were organized by Batching itens according to 
point being tested (specific grammar point or vocabulary) and by 
matching them in terms of difficulty on the FBI/CAL SLDs for 
translation. This latter matching required the test developers 
to make an estimate of the difficulty of rendering the 
translation, rather than of the difficulty of the language of the 
item itself in either the source or target language. The items 
were originally arranged in order of increasing difficulty. More 
items were developed than we anticipated would be needed on the 
final forms, so that items that did not function effectively 
could be discarded after pilot testing. Originally, there were 
64 items (35 Words or Phrases in Contejft and 29 Error Detection) 
in the Multiple Choice section of Form 1 and Form 2. The 
Production sections of both forms contained 22 Word or Phrase 
Translation items, 15 Sentence Translation items, and three 
Paragraph Translation items. 

Following extensive internal review^ CAL sent the ESVTE exam 
forms to the FBI for preliminary approval and revised them 
according to FBI suggestions prior to trialing. 
3.2. Pilot Test Scoring Procedures 

Answer keys were prepared for the Multiple Choice and 
Production sections. The keys were reviewed by FBI staff 
members, and a number of their suggestions were incorporated in 
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making revisions. 

Originally, examinee responses to the Multiple Choice 
section were to be scored by an optical scanner, which vould 
tabulate the number of correct answers. Examinee translations of 
the Word or Phrase Translation items in the Production section 
were to be scored by raters as being either correct or incorrect, 
according to the keys which had been prepared. 

In contrast, scoring of the Sentence Translations and 
Paragraph Translations was to be based on the new FBI/CAL 
Translation Skill Level Descriptions. The Translation Skill 
Level Descriptions were intended to characterize an examinee's 
performance on a range of materials. Thus, it was not possible 
to use them to score individual sentence items because these item 
texts were too restricted. Consequently, CAL staff developed 
simplified scoring guidelines, based on the FBI/CAL translation 
skill level descriptions, for evaluating both ST and PT items. 

In preparation for writing the simplified guidelines, the 
FBI/CAL skill level descriptions were reorganized so that all 
proficiency levels were described within each category, i.e. 
Grammar, Syntax, Vocabulary, Mechanics, Accuracy, and Ftyle and 
Tone. (For example, references to grammar in levels 0+ - 5 were 
all placed on the same page.) 

After studying these reorganized skill level descriptions, 
an attempt was made to characterize each level succinctly within 
each category. The plus levels were eliminated, so that the 
scale consisted of 0 - 5 points in each category. Because exam 
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texts were based prixaarily on legal and business docxments (i.e., 
formal writing) , which did not vary much in tenns of Style and 
Tone, it was decided not to include Style and Tone as separate 
categories in the scoring system^ The Vocabulaxry category was 
also eliminated, since aspects of this category could be subsumed 
under Expression and Accuracy. Finally, correctness in Mechanics 
(spelling and punctuation) was expressed in terms of numbers of 
errors for the Sentence Scoring Grid, and proportions of irems 
correct for the Paragraph Scoring Grid. The pilot version of the 
Sentence Scoring u is located in Appendix G; the Paragraph 
Scoring Grid can be found in Appendix H. 
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4* Trimling and Pilot Tasting 

This section describes the trialing and piloting of the 
ESVTE. The results of the piloting and subsequent revisions are 
also discussed. 
4.x. Trialing 

The trialing of the two forms of the ESVTE was carried out 
at CAL on Februai^ 17, 1989. Three CAL employees and one CAL 
spouse took the exams. The Spanish oral proficiency levels of 
these four people varied from level 2 to level 5, the latter 
being a practicing attorney who is an educated native speaker 
from Argentina. 

Before taking each form, examinees also completed a 
questionnaire that asked them to provide a global rating of their 
English and Spanish proficiency (see Appendix J). After 
completing each section of the test, they commented on it and 
noted on the questionnaires (see Appendix K) specific errors or 
problems they encountered. 

CAL examined the responses both to each item and to the 
questionnaire in order to determine which items should be 
modified and which should be deleted, and the exam forms were 
revised accordingly. 

On March 29, 1989 two FBI translators each took either Form 
1 or Form 2 of the ESVTE. They provided written feedback to CAL 
which was taken into consideration in revising the exams after 
the pilot testing. 
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4.2. Pilot Tasting 

This section describes the ESVTE pilot data collection, the 
results of pilot testing, and the revisions that were Bade 
following data analysis. 
4.2.1. Data Collection 

The ESVTE exan forms were piloted at Georgetown University 
on April 1, 1989. Forty-four undergraduate students from the 
Department of Translation and Interpretation completed the 
Multiple Choice sections of both forms. Each student was paid 
$12.50 for taking the sections. Graduate students in the 
Translation Certificate program took the complete exam; six 
students took Form 1 and five took Form 2. Each of these 
students was paid $15 for taking one form of the entire ESVTE 
exam. All examinees took the pretest exams together as a group. 

Of the 50 students who participated in the pretesting, 
English was the native language of 37 and Spanish was the native 
language of 7. Six students indicated another native language, 
but knew some Spanish. These other native languages were 
Portuguese, Tagalog, Korean, Chinese, Russian, and Italian. 

The Georgetown University students kept track of how many 
minutes it took them to complete each section of the exam. They 
also completed a questionnaire regarding their native language 
background and their proficiency in English and Spanish. 
(Appendix M contains a copy of the questionnaire; a summary of 
examinee responses is also located in Appendix M.) In addition, 
we asked students to comment on any items that were confusing or 

50 



52 



that caused them particular difficulty. 
4 .2. 2. Results 

Table 1 displays a summary of the performance of the pilot 
study examinees on the Multiple Choice sections of the ESVTE exam 
forms. Reliability estimates, calculated using Kuder-Richardson 
formula 20 (KR-20) , are also shown. " 

Table 1 

ESVTE Multiple Choice Sections 
Total Pilot Sample 



Form 


li 


Mean 


1 


5td, 


Pev, 


KR-20 


1 


50 


29. 


4 


46 


11. 


45 


.92 


2 


49 


28. 


5 


45 


10. 


07 


.88 


There 


were 64 


items 


on 


the pilot 


versi 


.on of 


Forms 1 and 



Using the mean percentage correct to compare the two forms, it is 
apparent that Form 2 was slightly more difficult than Form 1, 
although both forms appeared to be somewhat difficult for this 
group of examinees." The reliability estimates were fairly 
high, indicating that most of the items were functioning well 
(i.e., they were neither too easy nor too difficult, and 
generally discriminating well among high and low proficiency 
examinees) ^ 



yields an estimate of the internal consistency of 
the test items, i.e., a measure of the extent to which examinees 
perform consistently across the items vithin a test. It is very 
similar to parallel form reliability. 

^•a four-option, multiple choice exam of optimal difficulty 
would exhibit a mean score of 62.5% correct. 
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A record was kept of the tine it took students to complete 
the Multiple Choice sections. The amount of ti»e required ranged 
from 24 to 31 minutes. 

Since only a few examinees took the Production sections, 
descriptive statistics for this section were not calculated. The 
principal goals in piloting the Production sections were to 
evaluate the appropriateness of the scoring system, and to 
identify items that were either ambiguous, too easy, or too 
difficult. 
4.2.3. Revisions 

Students were divided by native language background 
(English, Spanish, and other) , and item analyses were conducted 
of their responses to the Multiple Choice section items. The 
results showed that the items were easier for the six native 
Spanish speakers. 

Since the item analyses showed that some of the items on 
both forms of the Multiple Choice section did not discriminate 
well, it was necessary to write a few new items and to revise a 
number of the existing items to make them more difficult. The 
revision process involved shortening the test by deleting some 
item^ and replacing others with new items that assessed a similar 
gramjiiar point or vocabulary item. Some o£ the distractors in a 
number of the remaining items were also modify* ed. Comments 
written by students after completing the exam were taken into 
consideration in identifying items for revision. We decided to 
include 35 Word or Phrase in Context items and 25 Error Detection 
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items, for a total of 60 items, in the final form of the Multiple 
Choice section. This is slightly fever than the 64 items 
included on the field test versions of the ESVTE* 

For the final version of Form 1, 4 (7%) new items were 
developed, and 29 (12%) of the distractors were modified; for 
Form 2, 5 (8%) new items were developed, and 20 (14%) of the 
distractors were revised. In general, the new items were 
designed to be more difficult, while the distractors were 
rewritten so that they would be more attractive to examinees. 

Responses to the Production sections were scored by CAL 
staff and consultants in order to try out the scoring procedures 
and to gather information that could be used in revising items. 
As with the Multiple Choice section, the Production section items 
were analysed in light of student performance (and comments from 
F*^! staff as noted above) . It was decided to include 15 embedded 
phrase, 10 sentence, and 3 paragraph translation items on the 
final versions of the exam forms. Twenty-one (78%) of the phrase 
and sentence items were deleted from Form 1, and 8 new items were 
created; 22 (81%) were deleted from Form 2, and 9 new items were 
created. None of the paragraph items were modified. 

The test booklets were revised to reflect the changes 
described above and copies were made in prepare^tion for the 
validation study described in section 5 of this report. 



ERIC 



53 



55 



5. Validation Study 

reliability and validity of the ESVTE as a seasure translation 
ability. In this context, the validation study had a number of 
specific aims. One aim was to field test the revised exam to see 
if its items and sections performed acceptably. Another aim was 
to administer the test to a more appropriate population than the 
pretest versions* population in order to set passing scores based 
on their performance.^' Further aim was to further assess the 
rating criteria that had been developed for scoring each part of 
the Production section. Another was to determine whether this 
section could be scored reliably. The validation study, or as 
the word "validation" implies also sought to gather information 
on the validity of the test. With the analysis of construct 
validity in mind, it was decided to collect scores on other 
measures from employee files and to assess the test's ability to 
predict overall translation ability by having raters make an 
overall assessment of ability using the FBI/CAL Translation SLDs. 
Another aim of the validation study was to gather evidence 
concerning criterion-related validity by having examinees rate 
their ability to translate various types of texts on the job, and 
then determine the relationship between scores on the test and 
the self-ratings. We chose to use self-ratings, rather than 
supervisor's ratings, because we were advised by the FBI that 



^he population that took the field test version consisted 
mostly of university students. 
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supervisors would not be in a position to evaluate translation 

ufc/jk i. ^ V J • j-»atwwA«««. «ajL«M «rc»o wv/ iMXiiis x A. «»^cai«uxilW70 a. c X U WAAV vis o u 

to be a valid evaluation of their translation ability • An 
additional aim was to gain a further understanding of the 
constructs the test measured; at the time we were not sure if we 
were measuring a single construct, two or more constructs, or 
whether we were measuring a test method effect (recognition 
versus production).'® Another purpose of the validation study 
was to determine the most appropriate weighting of the parts and 
sections. A final puzT>ose of the validation study was to gather 
the data necessary to equate the two parallel forms of the test. 
This section describes the validation study design, and data 
collection procedures. The results of the study are discussed in 
the following three sections. 
5.1. Overview 

The design of the validation study called for administering 
the ESVTE to FBI Language Specialists, agents, and other 
employees at various field offices around the country. It was 



This degree of uncertainty and the multiple aims of the 
validation study were due to the fact that so little was known 
about the measurement of translation ability at the time the 
project began. Thus, the validation study, and indeed the entire 
project, combined both e)q>erimentation with a commitment to 
develop and validate a test. To draw an analogy to the business 
world, it is as if we were carrying out both the research and 
development function and the manufacturing function at the same 
time. Under normal circumstances the manufacturing function is 
carried out after the R+D function has oeen completed. While far 
from ideal, the reality of our situation was that we were working 
under a fixed-price contract to manufacture a test. The client 
was aware of the possibility of R+D problems, and assumed that 
these would be worked out along the way. 
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hoped that by administering the test to a variety of employees, 
individuals of varying ability levels would be included. In 
order to examine the validity of the ESVTE, ecores on other 
measures of language ability were obtained from available 
employee files. 

Both forms of the ESVTE were given in one sitting (about 
four hours in duration) at each of seven FBI field offices. The 
order of administration of the forms was counterbalanced to 
control for the practice effect. Thus, approximately half of the 
examinees took Form 1 first and the other half took Form 2 first. 
5.1.1. Test Administration Instructions 

CAL developed a set of test administration instructions for 
the ESVTE. These included instructions to the test administrator 
regarding the following: 1) test security, 2) assembling test 
materials, 3) arranging for a testing site, 4) equipment, 5) 
administering the test (including timing of sections), and 6) 
procedures to follow after the test. Appendix A contains a copy 
of the administration instructions for the ESVTE. 
5 • 1 . 2 . Questionnaires 

CAL developed two questionnaires for use in the validation 
study: 1) a self-assessment questionnaire on which an examinee 
was asked to estimate his or her ability to render a verbatim 
translation from Spanish into English, and 2) a questionnaire 
requesting examinee feedback on aspects of the format and content 
of the exam. (A copy of the self-assessment questionnaire is 
located in Appendix N, and a copy of the exam feedback 
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questionnaire is in Appendix L.) 
3«1«3« Subjects 

Testing materials, including test administration 
instructions, numbered test booklets, answer sheets, pencils, 
questionnaires, and test administrator report forms^* vera sent 
to the FBI field offices in Los Angeles^ San Diego, Albuquercjue, 
Phoenix, and El Paso on November 15, 1989. Similar sets of 
materials were sent to Houston" and Puerto Rico on November 17, 
1989." Materials from ESVTE administration were returned to CAL 
within three to ten weeks.'* 



'^CAL developed this form for test administrators to note 
any irregularities that may occur with respect to test security, 
the test administration, or the condition of the test materials. 
We requested that the validation study tert administrators 
complete and sign the form even if th^^re were no irregularities. 
(See Appendix A for an example of this form.) 



"Arrangements were made for members of the Houston Police 
Department (for whom Spanish Oral Proficiency Interview (OPI) 
scores were available) to be tested along with the FBI employees 
at the Houston field office. 

"a cover letter was sent with the materials to the contact 
person at each field office. In addition to thanking them for 
their assistance in carrying out the validation study, the letter 
emphasized the importance of test security, outlined the 
procedures for the test administration, noted the proposed 
administration date, and instructed them to return all materials 
to CAL immediately after the test administration. A checklist of 
the materials was enclosed with each cover letter. CAL retained 
a copy of the checklists and used them to verify that all of the 
materials were returned as requested. 

^'Although most field offices were able to follow the 
administration procedures as outlined, a few had difficulty 
scheduling all of the examinees to be present for the test 
administration, and consequently had to give more than one 
administration of the same exam. These difficulties accounted 
for their delay in returning some of the exam materials. 
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In an effort to ensure that the entire range of abilities of 
potential test takers in the operational progrtun would be 
represented in the sample, CAL contracted three professional 
translators to take the full ESVTE forms* These exams vere 
administered at CAL on January 9, 1990* 

Hence, a total of 42 examinees took the ESVTE in the 
validation study* Of this group, 17 (31%) were FBI Special 
Agents, 11 (26%) were FBI >anguage Specialists (or contract 
linguists, who do similar work), 10 (24%) were FBI support staff, 
5 (12%) were members of the Houston Police Department, and 3 (7%) 
were professional translators* It should be pointed out that 
while it was originally envisioned that the subjects of the 
validation study would be limited to Language Specialists, we 
were unable to secure release time for an adequate sample of 
Language Specialists to take the test* After discussing 
alternatives with FBI Headquarters staff, it was decided to 
include other FBI personnel in the validation sample, as well as 
the other groups that were represented. 
5*2 • Scoring 

The Multiple Choice parts of the ESVTE forms were scored by 
machine, using answer keys based on the revised versions of the 
forms. 
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The Production parts were scored by CAL consultants Ana 
Maria Velasco and Matilde Farren*' using the scoring keys and 
analytic sentence and paragraph guidelines which had been 
pr3pared. Word and Phrase Translation items were scored using a 
key of acceptable responses, which has been provided to the FBI. 
Sentence Translation items were scored using the Sentence 
Accuracy Scoring Guidelines (See Appendix E) . These focused on 
the presence of mistranslations, omissions, and inappropriate 
additions in the content of the translation, as well as on the 
conveyance of all appropriate nuances. 

In order to determine which scoring system was most 
efficient and yielded the highest interrater reliability, the 
Paragraph Translations were scored in two ways, a) using the 
analytic paragraph guidelines, and b) using the FBI/CAL 
translation skill level descriptions. The ESVTE Paragraph 
Scoring Guidelines (see Appendix F) require the rater to assign 
each paragraph from 0-5 points on each of four criteria: 
grammar, expression, mechanics, and accuracy. The totals for the 
first three criteria, grammar, expression, and mechanics, are 
summed to produce the Expression score for the Production 
section. The ratings from accuracy are summed and contribute to 
the total Accuracy score, which is earned exclusively on the 
Production section of the ESVTE. The scoring g\iidelines for 



^^Both are certified by the American Translators 
Association. Ms. Farren is also a certified Federal Court 
Interpreter. 
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grammar require the rater to distinguish between errors in simple 

structures, and to consider the number of errors of ftach type in 
each paragraph. The scoring guidelines for expression require 
the rater to evaluate the paragraph for word order, vocabulary, 
idomaticity style and tone. After consideration of these, the 
rater makes a judgement as to the degree to which the translation 
follows the conventions of the source language or the target 
languages. The scoring guidelines for mechanics require the 
rater to evaluate each paragraph for frequency of errors in 
spelling, punctuation, and capitalization. The scoring 
guidelines for accuracy are identical to the scoring guidelines 
for Sentence Translation items. Additional information on the 
scoring procedures can be found in sections 2.1,3 and 2.2.3 of 
this report. 

After the scoring of the Production section was complete, 
each rater assigned an overall ability level for Expression and 
Accuracy, based on evaluation of the sentence and paragraph 
translations. This overall ability level was used in order to 
construct the FBI/CAL Translation Scale conversion tables. 

It should be noted that initially it was hoped that a single 
translation ability level could be assigned to each examinee. 
The decision to score Expression and Accuracy separately was made 
by CAL after the data were collected as a result of experience 
gained during the pilot study and after the scoring of an initial 
group of ESVTE papers from the validation study. This decision 



was made to aid in evaluating different types of examinee 

but inaccurate (as may occur when an examinee *s proficiency is 
higher in the targec language), while others were mostly accurate 
but evidenced problems vith grammar or vocabulary (as may occur 
when an examinee's proficiency is higher in the source language). 

In order to be able to assign separate FBI/CAL Expression 
and Accuracy scores, the original FBI/CAL Translation SLDs were 
reorganized so that the descriptions for Expression at each level 
were contained in one section and the descriptions for Accuracy 
in another. A copy of the reorganized SLDs can be found in 
Appendix I. 
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The data on reliability that resulted from the validation 
study test administration are prer«ented in this section by order 
of subtest. An effort was made to examine reliability in a 
number of ways and from a number of perspectives. It should be 
remembered that the data on reliability is a function of the 
sample tested and the raters used. 

6.1. Multiple Choice Section: Descriptive Statistics and 
Reliability 

Table 2 presents the results of the validation study 
administration of the Multiple Choice section of the ESVTE forms. 
This section is referred to here as MCl and MC2 . 

Table 2 

Descriptive Statistics for ESVTE MCI and MC2 

Form H Mean Std. Dev. Minimum Maximum 

MCI 42 36.9 9.99 12 55 

MC2 42 36.8 10.47 11 59 



As can be seen in Table 2, the mean scores on both forms of 
the Multiple Choice sections were almost identical. This 
indicates that both forms are of about the same difficulty. The 
slightly larger standard deviation for MC2 suggests that less 
competent examinees may have tended to score slightly lower and 
more competent examinees slightly higher on MC2 than they did on 
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MCI, 

As ^here were a t:otal of 60 ite&is in the ESvTE Multiple 
Choice section, the nean of 37 represents 62% correct* Thus, the 
Multiple Choice section appears to be of optimal difficulty for 
this sample." 

Table 3 presents the KR-20 reliability estimates for the two 
forms of the Multiple Choice section based on the validation 
study sample. KR-20 is a measure of internal consistency 
reliability, which is the degree to which the items (considered 
as a set) on a test measure the same ability. 

Table 3 

KR-20 Reliability for ESVTE MCI and MC2 

Form KR-20 

MCI .89 
MC2 .91 

The reliability of the Multiple Choice section of both ESVTE 
forms is high and indicates that either form can be used with 
confidence on a population similar to that of the validation 
study. 

A second indication of the reliability of the section is the 
consistency of performance of the group of 42 subjects on the two 
forms. Referred to as the coefficient of equivalence or parallel 
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^'We expect a mean of 62.5% on a four-option multiple choice 
test of optimal difficulty for the population, when the sample 
fully and equally represents the total range of abilities in the 
population. 
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form reliability, this type of reliability is obtained by 
calculating the Pearson Product Moment correlation between 
subjects • performance on the two different forms • For the 
multiple choice section on the two ESVTE forms, the coefficient 
of equivalence is •SO, which is very high* Together, both the 
KR-20 reliability estimates and the coefficient of equivalence 
are high, indicating that the two main sources of measurement 
error (inconsistency across items and inconsistency across forms) 
are minimal for the Multiple Choice section of the ESVTE. 
6.2. Production Section: Descriptive Statistics and Reliability 
of the Accuracy Score 

Table 4, which follows, shows the descriptive statistics for 
the ESVTE-Accuracy Subsections and Totals by form and by rater. 
Close examination of the means in Table 4 shows that the two 
raters appear to be consistent in their degree of severity, with 
Rater 1 always being more generous than Rater 2. Despite this 
consistent difference in raters, when mean scores are considered, 
the difficulty of the two forms appears very similar. Averaging 
the scores assigned by both raters, we see that the Word and 
Phrase Translations seem to be slightly harder on Form 1 (5.75 
versus 6.75 on Form 2), while the Sentence Translations seem to 
be slightly harder on Form 2 (24.8 versus 25.8 on Form 1). The 
Paragraphs also seem somewhat harder on Form 2 (6.5 on Form 1 and 
5.6 on Form 2). The average Total Score for Accuracy across the 
two forms differs by less than one point; it is 38.09 for Form 1 
and 37.17 for Form 2. Thus, in terms of total Accuracy scores, 
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there seems to be little difference in the difficulty of the two 
forms. 



Tabl« 4 

Descriptive Statistics for ESVTE Accuracy 
Forms 1 and Form 2 (N>42) 



Measure 



Word + Phrase 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Sentences 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Paragraphs 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Total 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 



6,5 
5.0 
7.3 
6.2 



29.6 
22.0 
26.9 
22.7 



8.1 
4.9 
5.8 
5.4 



44.19 
31.99 
39.99 
34.36 



Std. Dev. 



4.0 
3.9 
3.9 
3.7 



11.1 
10.5 
10.3 
10.1 



2.6 
2.1 
3.5 
2.4 



16.02 
15.55 
15.83 
15.19 



Minimum Maximum 



0 
0 
0 
0 



2 
3 
5 
3 



3 
0 
0 
2 



8 
6 
6 
7 



15 
13 
15 
14 



48 
45 
46 
48 



13 
10 
15 
13 



74 
S6 
76 
75 



Legend: R=rater, F=form. Thus Rl Fl is the sct, »-e assigned by 
rater 1 on form 1. 



In discussing the reliability of the ESVTE Accuracy scores / 
there are two sources of measurement error that need to be 
examined: inconsistencies across raters and inconsistencies 
across forms. Traditionally these have been examined separately / 
but contemporary general izeability theory allows us to look at 
both together. In this discussion we will first examine these 
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two sources of error separately by examining interrater 
reliability and parallel form reliability. We will conclude with 
an examination of the results of a generalizeability study on the 
data. 

Table 5 shows the interrater reliability (Pearson Product 
Moment Correlations) of the ESVTE Subsections and the total 
Production section score for Accuracy. The reliability for Form 
1 is listed first, followed by the reliability for Form 2. 

Table 5 
Interrater Reliability of 
ESVTE Production Subsections and Production Total 
for Accuracy (Forms 1+2) 

Form 1 Form 2 
Word and Phrase .94 .84 
Sentences .87 .78 
Paragraph (Accuracy) .61 .61 

Total Accuracy .92 .83 

The interrater reliability estimates of the Accuracy scores 
on all subsections are moderate to high with the exception of the 
Paragraph score. The highest correlation on both fonns is for 
Word and Phrase Translation. Correlations on Form 2 are lower 
for each subsection and for the total than on Form 1. The 
interrater reliability estimates for the total Accuracy score are 
high for Form 1 (.92) and adequate for Form 2 (.83). 

Table 6 presents the coefficient of equivalence of the 
Accuracy scores across forms and raters. This data is an 
indication of the parallel form reliability of the ESVTE across 
different raters. 
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Form 1 Rater 1 
Form 1 Rater 2 



Form 2 Rater 1 
.86 
.84 



Fpnn 2 Ratgr 2 

.87 
• 91 



As can be seen, t])e coefficient of eqpiivalence of the ESVTE 
Accuracy score is quite high for a free response test scored by a 
single rater. That is, there is a ligh degree of agreement 
across fonns and raters. This suggests that ESVTE Accuracy 
scores can be highly stable. Even under the nost severe 
circumstances, an examinee taking different forms of the test 
that are in turn scored once by a different rater, the scores 
show a remarkable degree of agreement. Thus, it appears that the 
reliability of the ESVTE Accuracy score is high.^' 

In order to more efficiently examine the effects of rater 
severity on the reliability of the ESVTE -Accuracy Subsection, a 
generalizeability study (G-study) was undertaken on the total 
ESVTE-Accuracy Score. A G-study is a means of looking at 
multiple sources of variance simultaneously. In this study, the 



^'Again, it should be remembered that the consistency of the 
ESVTE Accuracy score is dependent on well-trained raters. In an 
operational program, however, it should be possible to exceed the 
reliability attained in this experimental study. Operational 
raters will have the benefit of being able to train using the 
rater training materials that were a by-product of this project. 
In this study, the raters approached the task of rating without 
t. ^ benefit of having undergone a rater training program. 
Ratings were done on an intermittent basis at home as the raters' 
personal schedules permitted. 
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two sources ot variance investigated were forms and raters. The 
results are presented in Table 7. 



Table 7 

Variance Contributions of Raters and Forms 
to the ESVTE-Accuracy Total Score 



Source of 
Variance 



Variance Component 
Estimate 



Standard 
Error 



Persons 

Forms 

Raters 

Persons x Forms 
Persons x Raters 
Forms x Raters 
Residual 



208.636 



-4.912* 
34.761 
5.620 
7.364 
9.929 
23.357 



47.75 
4.30 

33.08 
4.50 
4.82 
8.56 
5.04 



*A negative variance estimate is an artifact of the estimation 
procedure. Generally these can be regarded as equivalent to zero 
(Brennan, 1983, p. 103). 

Table 7 shows that the variance due to the forms or any two- 
way interactions is relatively small in comparison to the 
variance measured among the persons. Of these, the highest 
variance component (9.929 for a form by rater interaction) is 
only 4.75% as large as the largest component and represents only 
3.4% of the total variance of 289.667. However, the variance due 
to raters is somewhat large (34.761), 16.7% as large as the 
person variance and representing 12% of the total variance. 
Moreover, the residual variance (containing that due to the 
three-way person by form by rater interaction and any random 
variance) is also relatively large. These figures imply while 
differences in scores due to forms were relatively minor, rater*^ 
were inconsistent with each other, although fairly consistent 
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across forms* 
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in a decision study (or D-study) to estimate the reliability 
(generalizeability coefficient) of a test under various 
conditions of the facets being studied. Table 8 presents the 
estimated generalizeability coefficients given both raters and 
forms as sources of error under various groupings of two forms 
and two raters. 



Table 8 

Estimated Generalizeability Coefficients for the 
ESVTE -Accuracy Score using Different 
Groupings of Forms and Raters 



Number of Number of Generalizeability 

Forms Raters Coefficient 

1 1 .85 

1 2 .91 

2 1 .91 
2 2 .94 



The results in Table 8 show that the reliability for the 
ESVTE-Accuracy scor^ when one form and two raters are used, is 
.91, given measurement errors due to both raters and forms. This 
is very high for a rater-scored test. It may be noted that the 
reliability using two forms and two raters (as was the case in 
the validation study for the development of the SEVTE) was a very 
high .94. 
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6.3. Production 8*etio&: Descriptiv* Stfttistlcs and Sftl lability 
of tta* Exprassion Boor* 

Table 9 below shows the ESVTE-Expression descriptive 
statistics (raw scores) for the Production section of the test by 
form and by rater. In the Production section, only the Paragraph 
Translations are rated for Expression. They are rated for the 
three criteria that figure into the total score for Expression. 
These criteria are Grammar, Expression, and Mechanics. 



Table 9 

Descriptive Statistics for ESVTE Expression: Paragraphs 
Subsection Form 1 and Form 2 (N=42) 



Measur e 

Grammar 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Expression 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Mechanics 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 



Mean 



8.9 
5.3 
7.1 
6.7 



7.2 
4.3 
5.3 
4.6 



9.0 
9.3 
7.1 
8.3 



Std. Dev. 



3.6 
2.8 
3.8 
3.3 



2.7 
2.5 
3.0 
2.3 



3.6 
3.9 
3.9 
4.5 



Minimum Maximum 



Total (for Expression production section) 
Rl Fl 25.2 9.1 

R2 Fl 18.9 8.6 

Rl F2 19.5 10.2 

R2 F2 19.7 9.3 



3 
0 
0 
1 



3 
0 
0 
0 



2 
0 
0 
0 



9 
0 
0 
4 



15 
12 
15 
15 



15 
12 
15 
10 



15 
15 
15 
15 



45 

39 
45 
39 



Legend: R=rater, F=form. Thus Rl Fl is the score assigned by 
rater 1 on form 1. 
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Close examination of Table 9 shows that, as in the Accuracy 
scores, Rster 1 was more lenient than Rater 2 in all the 
Expression subscores on the Production section except Mechanics. 
The difference in Mechanics was slight for Pom 1 but for Form 2 
it was enough to make the final total scores almost equal on that 
form. 

Overall, Form 2 appears to be slightly more difficult than 
Form 1. Averaging the scores assigned by both raters, we see 
that the Paragraph Translation Expression scores seem to be 
slightly lower on Form 2 for all three scoring criteria. For 
Form 2 grammar, the mean is 6.9 versus 7.1 for Form 1. For Form 
2 expression, it is 4.95 versus 5 •75 for Form 1. For Form 2 
mechanics it is 7.7 versus 9.15 for Form 1. For the total scores 
on this section, the mean on Form 2 is 19.6; for Form 1 it is 
22.05. The total means differ by 2.45 points. Given the large 
standard deviations of the scores, this is probably not a 
statistically significant difference. 

As in the discussion of the reliability of the Accuracy 
scores, we will first look at interrater reliability and parallel 
form reliability for Expression separately. Table 10 shows the 
interrater reliability estimates (Pear-5on Product Moment 
Correlations) of the ESVTE Production subsections and the total 
Production section score for Expression. These scores are all 
based on the Paragraph Translation subsection of the Production 
section of the test. The reliability for Form 1 is listed first, 
followed by the reliability for Form 2. 
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Table 10 
Interrater Reliability of 
ESVTE Production Subsections and Production Total (Forms 1+2) 



Form 1 



Fom 2 



Paragraphs-Grammar 

Paragraphs-Expression 

Paragraphs-Mechanics 



• 78 
.83 
.75 



• 53 

• 57 

• 68 



Total Expression* 



• 84 



• 63 



♦Total for Expression is for the total of the three Expression 
subscores on Paragraphs only. 

For Form 1, the interrater reliabilities for the three 
Expression criteria are moderate to good. The correlation for 
the total scores (.84) is quite acceptable. Interrater 
consistencies for Form 2 are lower than those for Form 1 across 
the board. This indicates that the raters were more consistent 
when they were scoring Form 1 than Form 2.'* 

Table 11 presents the coefficient of equivalence of the 
total Expression scores on the Production section across forms 
and raters. These data are an indication of the parallel form 



"it should be noted that interrater reliability is a rater 
characteristic, not a test characteristic. Nevertheless, a test 
developer must present information on interrater reliability^ In 
the future, the interrater reliability of the ESVTE will depend 
on the reliability of the individuals who score the ESVTE. 
Raters in the ESVTE operational program, however, will have the 
advantage of having available training materials that were 
generated as a by-product of this study. Thus, these ESVTE 
operational raters should exceed the reliability of raters in 
this developmental study. In this study, the raters approached 
the task without the benefit of having undergone a rater training 
program. Thus, the raters may have used different scoring 
standards at different points during the three months that they 
were rating the production section. Ratings were done? on an 
intermittent basis at home. 
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reliability of the ESVTE across different raters. 



Table 11 

Coefficient of Equivalence for ESVTE Expression Scores 
(Production Section only, N»42) 



Form 2 Rater 1 Form 2 Rater 2 

Form 1 Rater 1 .66 .83 

Form 1 Rater 2 .70 .88 



These data indicate that across foras, Rater 2 was nore 
consistent than Rater 1. Across raters and forms, scores were 
moderately consistent. 

In order to examine the combined effects of rater and form 
interaction on the reliability of the ESVTE-Expression Production 
section, a generalizeability study (G-study) was undertaken on 
the total ESVTE-Expression Production Score. As in the previous 
study, the two sources of variance investigated were forms and 
raters. The results are presented in Table 12. 



Table 12 

Variance Contributions of Raters and Forms 
to the ESVTE-Expression Production Total Score 

Source of Variance Component Standard 

Variance Estimate Error 



Persons 65.458 15.25 

Forms -1.975* 4.80 

Raters -.371* 5.63 

Persons x Forms -•2.942* 3.27 

Persons x Raters -.028* 3.69 

Forms x Raters 9.526 8.25 

Residual 24.226 5.22 

*The negative variance estimate is an artifact of the estimation 
procedure. Generally these can be regarded as equivalent to zero 
(Brennan, 1983, p. 103). 
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Table 12 shows that the variance due to the raters, forms, 
person by for™s interaction &nd person by r*ter interdiction is 
negligible. However, there is a relatively large anount of 
variance in the residual, which contains both randon error and 
error caused by the three-way person by fom by rater 
interaction. This variance (24*226) is 37% as large as the 
variance in persons and represents 24% of the total variance of 
99.21. Additionally, the variance due to form by rater 
interaction (9.526) is 15% as large as the person variance and 
9.6% of the total. These results tend to indicate that raters 
were not consistent in the way they ranked individuals across the 
two forms and in the standards they applied to the two forms. 

These results can be illustrated by comparing the total 
Expression Production means in Table 9. On Form 1, Rater 1 is 
much more lenient than Rater 2 (25.2 versus 19.5). On Form 2, 
however, Rater 1 is much more strict than she is on Form 1 (19.5 
versus 25.2), while Rater 2 becomes slightly more lenient on Form 
2 (18.9 versus 19.7). In addition, on Form 2, Rater 2 is 
slightly more lenient than Rater 1 (19.7 versus 19,5). These 
results indicate that further training of raters on rating the 
paragraphs for Expressior scores will be necessary in the 
operational program of the ESVTE. Otherwise, the reliability for 
Expression score on the Production section may be less than 
satisfactory. 

Table 13 presents the estimated generalizeability 
coefficients from a D-study produced by the variance components 
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estimated above given both raters and fo.nns as sources of errors 
under various groupings of two forms and tvo raters. 



Table 13 

Estimated Generalizeability Coefficients for the 
ESVTE-Expression Production Score using Different 
Groupings of Forms and Raters 



Number of 
Forms 



Number of 
Raters 



Gene ral i zeabi 1 i ty 
Coefficient 



1 
1 

2 
2 



1 
2 
1 
2 



.73 
.84 
.84 
.91 



The results in Table 13 show that the reliability for the 
total ESVTE-Expression score on the Production section, vhen one 
form and two raters are used, is .84, given errors due to both 
forms and raters. This is adequate for a rater-scored test. In 
addition, two things should be noted. First, this score makes up 
only part of the ESVTE total Expression score since the multiple 
choice section is also included in it. Second, the reliability 
using two forms and two raters (as was the case in the validation 
study for the development of the SEVTE) was a very high .91. 

The final total ESVTE Expression score is a composite of an 
examinee's score on the Multiple Choice section of the test and 
the Production section total, discussed above. Most of the 
points that can be earned by an examinee in the ESVTE Expression 
score are earned in the Multiple Choice section; i.e., the 
Expression score is the sum of the three subscores in the 
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Production section (maximum of 45 points) and the MC section raw 
score (maximum of 60 points), as explained in section 1*3 of this 
report. Because the total Expression score is a composite of the 
Multiple Choice section score and the Production score, it is not 
possible to calculate a single empirical estimate of the 
reliability of this composite score in the same convenient way 
that one might for a multiple choice test. There are, 
however, a number of ways of looking at the reliability of this 
composite score. 

First, in order to examine the effects of different raters 
on the consistency of the composite ESVTE Expression score, we 
can calculate the degree of agreement in composite Expression 
scores when different raters score the Production section. The 
correlation between the composite Expression scores, when the 
points awarded by each rater are added to scores obtained on the 
corresponding MC section, is .96 for Form 1 and .93 for Form 2. 
These correlations are quite high, suggesting that the composite 
Expression score is quite stable across raters. This finding is 
rather important to an appreciation of the reliability of the 
Expression score. 

A second way is to look at the consistency of scores earned 
on the two different forms. This comparison produces an index 
known as the coefficient of equivalence or parallel form 
reliability. This coefficient of equivalence is represented in 
Table 14 below. 
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Table 14 

Coefficient of Equivdlence for ESVTE Expression Composite Scores 

(N-42) 



Form 1 Rater 1 
Form 1 Rater 2 



Form 2 Rater 1 
.87 
.87 



.92 
.93 



This table depicts the four indexes of equivalence that can 
be calculated when each of two test forms is scored by two 
raters. For example, the correlation between total scores when 
rater 1 scores both Form 1 and Form 2 is .87. As can be seen, 
the average coefficient of equivalence is about .90. 

A final way to examine the reliability of the composite 
Expression score is to look at the internal consistency of the 
two part scores (MC and Production) combined to form the 
composite using coefficient alpha. This views the composite 
score as composed of two subsections. Calculated in this manner, 
coefficient alpha for Form 1 is .89; for Form 2 it is .(7. (Note 
that to form the total scores for Expression, the production 
section scores awarded by the two raters have been averaged.) 
These high internal consistency estimates for the total 
Expression score indicate that the two subtests (MC and 
Production) of this section appear to be measuring the same 
thing This finding justifies the formation of a composite score 
by adding them together. 
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1. Bxaalnlng ttxm Validity of th# B8VTZ 

According to the Standards for Educational and Psychological 
Testing (American Educational Research Association, ct al., 
1985) , test validity refers to "the appropriateness, 
meaningfulness and usefulness of the specific inferences made 
from test scores" (p. 9). Validity is demonstrated by an 
accumulation of evidence that supports the claim of validity for 
a particular test. Some of this evidence is empirical. Other 
evidence may be qualitative, in that it deals with the content of 
the test, or it may be theoretical in that it deals with a 
theory about the nature of the trait being measured by the test. 
In the case of the ESVTE, the central validity concern is the 
claim that the test is a measure of the ability to translate a 
written text in English into correct and appropriate Spanish. 

Traditionally, three types of validity are usually 
identified according to how the evidence was gathered. These are 
content validity, criterion-related validity, and constxruct 
validity. Construct validity, which ••focuses primarily on the 
test score as a measure of the psychological characteristic of 
interest" (AERA, et al., p. 9), may be understood to subsume the 
other two types; i.e., content and criterion-related validity are 
also evidence of the construct validity of a test. Thus, 
construct validity is of central interest. We will work toward a 
discussion of the construct validity of the ESVTE, by beginning 
with an analysis of its content validity. Subsequently, we will 
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examine the construct validity of the test aore directly, through 
analyses of the trait that Is being measured by the test^ 
Finally, ve will examine the criterion-related validity of the 
ESVTE by considering its relationship to success at translating 
and to other measures of language proficiency. 
7.I. Content Validity 

Content validity is evidence that demonstrates the degree to 
which the sample of items, tasks or questions on a test are 
representative of the domain of content that coula be tested. In 
the case of the ESVTE, evidence for its content validity is found 
in the tasks examinees are asked to perform to demonstrate their 
ability to translate from English to Spanish. 

First, the Multiple Choice section involves two general 
tasks required of English/Spanish translators: recognizing 
whether a proposition in English is rendered into Spanish with 
appropriate expression, and recognizing errors in written 
Spanish. Clearly, the ability to select the appropriate word or 
phrase from among the many that could be available or correct in 
other contexts is a skill that a translator must have. A 
translator uses this ability to recognize infelicities in his or 
her work in order to revise it successfully. In addition, the 
ability to recognize errors in Spanish is important because the 
translator must be able to revise his or ^er first draft so that 
it represents appropriate Spanish expression. Otherwise, the 
translator's Spanish rendition can be accurate in terms of the 
rendition of the content ot the source document, but it will 
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still appear to be a translation. 
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Choice items: 35 Words or Phrases in Context (WPC) items and 25 
Error Detection (ED) items. WPC items test a wide variety of 
points of Spanish and English grammar. These points include 
subject-verb agreement, verb tenses, pronouns, prepositions, 
gender, and word order. They also test a range of English- 
Spanish vocabulary, including nouns, verbs, adverbial and 
adjectival phrases, and false cognates. Each item on each of the 
two forms of the test focuses on the same or nearly the same 
aspect of grammar or vocabulary. The 25 ED items include errors 
of grammar, word order, vocabulary, punctuation or spelling. 
Thus, of the seven criteria included in the Translation skill 
level descriptions (accuracy, grammar, vocabulary, style, tone, 
spelling, and punctuation) developed for this project, these 
Multiple Choice items test all except style and tone.^* (For 
additional information relevant to the content validity of the 
Multiple Choice section, see the content analysis in Appendix D.) 

Second, apart from the ability to identify correct and 
incorrect expression, the ability to produce a correct 
translation is clearly required of a translator. The ability to 
produce a correct translation is assessed through 28 direct 



^'one way that vocabulary is tested is through the 
mistranslation of words. Mistranslation involves b^^Ji the 
vocabulary and accuracy aspects of the SLDs. Thus, the construct 
of Accuracy is partly represented in the content of the multiple- 
choice section. 
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production tasks. 15 of these tasks involve the translation of a 
word or a phrase within a sentence, called Word and Phrase 
Translation (WPT) ; 10 involve the Spanish translation; of coisplete 
English sentences (called Sentence Translation or ST) that range 
in length from 8 to 17 words; and 3 tasks require Paragraph 
T^-anslation (FT), the ability to produce an English translation 
of a paragraph in Spanish. The three paragraphs range in length 
from approximately 70 to 90 words. 

The 15 Word and Phrase Translation (WPT) items and the 10 
Sentence Translation (ST) items present examinees with a variety 
of problems in vocabulary, idioms, grammar (morphology) and 
syntax. We judged the sentences to range in difficulty from 2+ 
CO 4+ on the FBI\CAL Translation Skill Level Descriptions, based 
on the frequency and complexity of language they employ and the 
difficulty the language presents to the translator . The items 
in each section are grouped by order of the perceived difficulty 
of the sentence on the FBI\CAL SLDs. Corresponding items on each 
of the two forms are parallel in content and perceived 
difficulty. 

For WPT items, item developers relied on their expertise as 
translators and as language teachers in order to develop 
appropriate items. They created items that test aspects of the 



As indicated by Stansfield and I.iskin-Gasparro in Duran et 
al. (1985), it is heretical to the ACTFL/ILR SLDs to classify 
decontextualized language, such as words, phrases, or sentences 
on the ILR scale. Still, for research or training purposes it is 
sometimes necessary to do this. An appropriate disclaimer of 
these difficulty levels is noted here. 
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language that present special difficulty when translated to the 
target language^ often cases where there is no direct equivalent. 
For example, the expression "priced in the teens," has no direct 
equivalv^nt in Spanish, and use of the dictionary would not be 
helpful. In this case, the translator aust use his knowledge of 
both languages to construct an appropriate translation. 

The ST items were constructed to include grammar problems 
that have traditiv^nally created difficulties for translators and 
language students because of a lack of congruence between the two 
languages. Such problems include pronouns, verb tenses and 
sequences of verb tenses, use of negatives, possessives, 
prepositions, and nontemporal verb forms, such as infinitive, 
gerund, and past participle. 

The first Paragraph Translation (PT) text is a newspaper 
account, using mature vocabulary and syntax, of a crime that 
occurred in a Spanish-speaking country. The subject of the crime 
is hijacking or sabotage, depending on the. form of the test. 
This text was judged to be a low level 3 text based on the ILR 
SLDs for reading. 

The second PT text is political/philosophical in nature. It 
deals with either the ArmeJ Forces or ecology. The difficulty 
level of this text was judged to be at 34-. 

The third PT text is a law or a legal interpretation of a 
law. The difficulty of this document is considered to be at the 
4+ or 5 level on the ILR 'okill level descriptions for reading. 
Thus, the third text is clearly the roost difficult. 
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The entire Prcxluction section is scored using scoring 
guidelines (see Appendix F) that are based on the level . 
descriptions in the FBI/CAL Translation Skill Level Descriptions 
(see section 1.4 and Appendix I). The guidelines for scoring all 
the paragraphs include nearly all of the criteria included in the 
Translation SLDs. These descriptions were developed over a 
period of six months and represent a consensus among experienced 
translators and translation test evaluators. 

The text material that appears on the ESVTE was influenced 
by the results of the survey of FBI translation needs (see 
Appendix Q and section 1.3 of this report). This questionnaire 
was responded to by 28 Language Specialists. The results 
indicated that the written materials the respondents most often 
deal with involve politics, narcotics, terrorism, foreign 
counterintelligence, written laws, theft, and organized crime. 
Some of the ESVTE texts were provided by the FBI, and those found 
by CAL staff were judged relevant by FBI Language Specialists. 
Texts found by CAL staff were taken from two sources: public 
documents such as newspapers and magazines, and documents that 
item writers have actually translated in their work. The texts 
taken from public documents were guided by sample texts provided 
by the FBI, especially in terms of vocabulary. These texts, as 
well as the texts that item writers had previously translated on 
the job, were edited slightly to make them more suitable for 
these tests. The third paragraph, which is a legal document 
written in appropriate jargon, (sometimes referred to as 
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"legalese" among government linguists) was supplied by the FBI 

parallel as possible to the SEVTE, CAL staff located sinilar 
legal documents in English and Spanish for the different forms of 
the two test batteries. 

It is interesting to examine the responses of the validation 
study subjects (agents, contract linguists, and Language 
Specialists) to the exam feedback (questionnaire they completed 
after taking the test (see Appendix L) . On this questionnaire^ 
37% either agreed or strongly agreed with the statement, ••The 
material in the exams was representative of the types of written 
documents I might encounter in my work.* Another 63% either 
disagreed or disagreed strongly with the statement. It is 
difficult to interpret this data in terms of job relevance. 
Judgments of the job relevance of a test are highly dependent on 
the relationship between the test and the job of the individual 
subject, and the subjects in the sample varied greatly in the 
agency they worked for and in the job they performed. It must be 
remembered that within the sample of 42 examinees, 31% were FBI 
Special Agents, 26% were FBI Language Specialists (or contract 
linguists who do similar work) , 24% were FBI support staff, and 
12% were members of the Houston Police Department. The ESVTE was 
designed with the knowledge that it would be taken principally by 
potential and current Language Specialists and others who might 
wish to demonstrate the ability to do the type of translation 
that Language Specialists regularly do. Yet due to the shortage 
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of Language Specialists within the FBI, Language Specialists made 
up only 26% of the validation study sample. Under the 
circumstances, the responses to the job relevance question on the 
exam feedback questionnaire are not as negative as aight have 
been expected. 

One of the subjects wrote on the questionnaire: ••The 
vocabulary used is not representative of that encountered in my 
work. The person who passes this exam will do great in the 
diplomatic field or as a translator in a federal court, but most 
probably will not be able to deal with the language heard on a 
Title III."'^ This telltale comment, apparently %n:itten by a 
Special Agent, represents the perception that the test reflects 
sophisticated written language rather than the spoken language 
that FBI Special Agents involved in drug cases are norm*. / asked 
to monitor or summarize. The translation of most sophisticated 
written documents is done by Language Specialists, rather than 
Special Agents. Thus, the above comment reflects the discrepancy 
between the job of the individuals involved in the validation 
study sample and the job of the individuals who will eventually 
be selected by the test. 

At the same time, it is noteworthy that there was a more 
general agreement that the test measured translation ability. 
58% percent of the subjects either agreed or strongly agreed with 
the statement "There was sufficient opportunity for me to 



^^A Title III is an authorized wiretap. 
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demonstrate my ability to translate from English to Spanish. It 
may be that the 42% who disagreed with this statement did so 
because they felt unduly restricted by the time constraints of 
the testing situation; 40% of the s^ibjects felt the length of 
time given for the production section was "too short," and none 
felt it was "too long»" 60% felt it was "about right." (It may 
be noted that on the multiple choice section, examinees were 
• markedly more positive about the length of time given, with 81% 
indicating it was "about right," and only 10% responding that At 
was "too short.") 

In interpreting the responses to the examinee questionnaire, 
it is important to note that approximately 15% of those who took 
the ESVTE in the validation study had received scores of 2+ or 
less on the Spanish OPI (see section 7.2 below). These subjects 
may have understandably felt pressured by the exam time 
constraints, since nearly all of the tasks on the test were above 
their level of ability. On the other hand, those subjects whose 
proficiency was very high may not have had sufficient time to 
revise their translations. Indeed, several of the examinees 
indicated this to test administrators, who in turn reported it to 
CAL on the test administrator report form. Because of this, CAL 
has recommended that the amount of time allowed for completing 
the Paragraph Translation subsection be increased from 37 to 48 
minutes; i.e., 11 minutes more than examinees in the validation 
study sample were permitted. This may have the effect of raising 
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scores on the test somewhat 

Tn owner's 1 ^ the iisplications for test vslicSity of tiie 
responses to the examinee questionnaire are lessened by the fact 
that a) most examinees in the validation sample were not Language 
Specialists, b) because of this, many had low ability in written 
translation, and c) the test was too speeded* This last problem 
has been corrected on the current forro of the test by increasing 
the time limit for the Paragraph Translations from 37 to 48 
minutes • 

7»2 Construct Validity 

Traditionally, validity has been defined as the degree that 
a test measures what it claims to »easure» Evidence of validity 
has been divided into three types: content validity, construct 
validity, and criterion-related validity* However, during the 
past 15 years, validity has come to refer to the inferences that 
can legitimately be made from test scores for a particular type 
of examinee and for a particular purpose. Similarly, construct 
validity has become synonymous with validity itself (Messick, 
3580). Because of this, the same definition is also the 
contemporary definition of construct validity. However, within 
the context of the validity section of this report, we have made 
use of the traditional division of kinds of validity in order to 

"The general increase in the test scores that may be 
obtained by increasing the time available to examinees to 
complete the test should be viewed positively. It is likely that 
if sjores do increase under extended time limits, this will be 
due to a reduction in test speededness, and the scores will be 
more accurate. For additional information, see Appendix P. 
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organize a fairly complex presentation of the evidence for 
validity that vas gathered * Thus, ve vill ugh consider ths sore 
limited, traditional definition of construct validity; that is, 
the dimensions of ability that are being Beasured by the test. 

In the introduction to this report we identified and 
described two dimensions of translation ability: Accuracy and 
Expression. We discussed how these dimensions evolved from our 
efforts to develop Translation SLDs, from our research on the 
Listening^ Sumitary Translation Exam , and from our initial scoring 
of the SEVTE test papers. These two dimensions of translation 
ability were strongly supported by the results of our analyses of 
the SEVTE test data (Stansfield et al., 1990b). Thus, we begin 
this analysis of the construct validity of the ESVTE by stating 
that the test claims to measure overall translation ability, but 
that it divides this ability into two dimensions (Accuracy and 
Expression) and it claims to measure each. Accuracy is the 
degree to which the information in the source document is 
conveyed in the target document* Errors in Accuracy include the 
misrepresentation or deletion of information in the source 
document, or the inclusion of information that was not in the 
source document. Expression, on the other hand, focusf^s on the 
appropriateness of the language used in the target document. 

When a test measures two distinct dimensions, the measures 
of those should demonstrate some unique score variance. Thus, 
while the measures may be related, they should be 
distinguishable. Table 15 below presents the correlations 
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between the total scores for Accuracy and Expression for Forms 1 
and 2 of the ESVTE, 



Table 15 

Correlations between Mean Total Expression and Accuracy Scores . 

on Form 1 and Term 2 
(n - 42) 

TOTEXPFl TQTEXPF2 TOTACCFl TOTACCF2 

TOTEXPFl 1.00 

T0TEXPF2 ,93 1.00 

TOTACCFl ,96 .94 1.00 

T0TACCF2 .92 .90 .93 1.00 

Legend: TOTEXPFl « Total Expression Score, Form 1 
T0TEXPF2 «= Total Expression Score, Form 2 
TOTACCFl « Total Accuracy Score, Form 1 
T0TACCF2 «= Total Accuracy Score, Form 2 

As can be seen in table 15, the correlation between these 
two total scores for Form 1 is .96, while for Form 2 it is .90. 
These high correlations (the average of which ' is .93) suggest 
that the two subscores are measuring the same ability. This 
finding is further corroborated by examining the correlation 
between the two scores that claim to represent the Accuracy 
dimension and the two scores that claim to ^aeasure the Expression 
dimension ♦ Note that the correlation between the Accuracy score 
on Form 1 and the Accuracy score on Form 2 is .93. Similarly, 
the correlation between the Expression total score on Foxrm 1 and 
the Expression total score on Form 2 is also .93. These 
correlations between measures of the same dimension are exactly 
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the same as the average correlation between the two »ea*^ures of 
different dimensions mentioned above* Thu8# since each measure 
correlates as highly with a measure of another dimension as it 
does with a measure of the same dimension, it is not possible to 
claim, based on this data, that the ESVTE measures two dimensions 
of translation ability* (The cause of the different finding for 
the SEVTE and the ESVTE will be explained later*) Furthermore, 
it appears that each subscore is a measure of the same global 
trait being measured by the test* 

We will now turn to a discussion of criterion-related 
Validity. This discussion provides a better understanding of the 
global trait being measured and how it relates to other relevant 
traits. It also permits a better understanding of the effect of 
the characteristics of the validation study sample on the global 
trait identified through the analysis of the data collected. 
7.3. Criterion-related Validity 

Criterion-related validity is evidence that •'demonstrates 
that test scores are systematically related to one or more 
outcome criteria" (AI.v, p. 11) . For example, if supervisors 
ratings of employees' translation ability were available, then it 
would be important to see how scores on the ESVTE and supervisors 
ratings compared. Unfortunately, the Special Agent in Charge at 
each local FBI office is rarely able to rate the translation 
ability of Language Specialists or Special Agents, because a 
variety of languages may be represented in each field office. 
Thus, an appropriate existing criterion variable was not 
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available to the authors of this study. 

In an effort to remedy this situation, we constructed two 

concurrent measures that can serve as a variable for determining 

criterion-related validity. The concurrent criterion-related 

variables are described below. 

Concurrent Criterion-Related Measures 

Overall FBI/CA L Expression and Accuracv Scores rEXPFBICAL 
and ACCFBICAL ) . After the two raters in the validation 
study assigned analytical scores to each section of the 
production section of the ESVTE, they assigned each examinee 
two overall scores on the FBI/CAL Translation SLDs: one for 
Expression and one for Accuracy, based on the examinee's 
performance on the Sentences and Paragraph subsections of 
the Production Section. Each examinee took two forms. 
Thus, each examinee's overall FBI/CAL Expression and 
Accuracy score is the average of four ratings (two raters by 
two different forms). These overall FBI/CAL Expression and 
Accuracy scores were obtained for all subjects. They 
provide two measures of criterion-related validity. 



The data on two of the two concurrent criterion-related 
validity measures provide a basis for assessing the criterion- 
related validity of the ESVTE. Correlations between the Total 
Accuracy and Expression scores on oach form of the ESVTE with 
these concurrent measures are presented in Table 16 below. 



91 



Tabltt 16 

Correlations of the ESVTE Scores 
with Overall Rating of Translation Ability 

(N « 42) 

EXPFBICAL ACCFBICAL 
EXPl .91* .SI* 

EXP2 .90* .91* 

ACCl .93* .92* 

ACC2 .88* .91* 

♦ p < .0001 

Before beginning a discussion of the relationships in Table 
16, it is appropriate to consider the validity and reliability of 
the two measures of criterion-related validity (EXPFBICAL and 
ACCFBICAL) . 

As indicated in the description of the FBI/CAL overall 
Expression and Accuracy ratings, after scoring each paper 
analytically, the raters then referred to the FBI/CAL Translation 
SLDs to determine an appropriate holistic rating for each 
examinee based on his or her performance on the Sentences and 
Paragraphs subsections of the Production section of the test. 
This holistic rating is a rating of overall translation ability 
based on performance in translating 10 challenging sentences and 
three paragraphs of varying difficulty. Thus, this holistic 
rating can be considered a performance-based assessment of 
translation ability. Its validity as such is limited slightly by 
the fact that of the four ratings (two ratings on each form) that 



92 



go into this composite holistic rating, two were awarded by the 
same rater that i^cored the fom correlated in '^'j.hie 16 with the 
holistic rating • Thus, two of the ratings are not wholly 
independent. However, the other two ratings were based on 
success at translating different texts. In this case, the 
different texts were the sentences and paragraphs appearing on 
the other ESVTE form. While one approach might have been to use 
the FBI/CAL skill level assigned by the two raters who scored the 
other form as the criterion variable (as discussed in footnote 
33), we chose to combine all four ratings from the two forms into 
a single indicator of translation skill level in this study. 
This composite rating has the advantage of being based on twice 
as many performance tasks, (20 sentences and six paragraphs) and 
twice as many ratings of translation skill level; that is, four 
ratings instead of two ratings. Thus, this composite rating of 
translation skill level can be considered to be both more 
reliable and more valid because of the number of tasks and 
evaluations (ratings) on which it was based. 

In order to determine the reliability of the criterion 
variables, i.e., the composite FBI\CAL overall rating of 
translation ability for Accuracy and Expression, a 
Generalizeability (G) study was performed on the data that went 
ir.to the composite rating. The results of the G study, using 
forms and raters as facets, with 42 persons, 2 forms and 2 
raters, indicated that the G coefficient for the EXPFBICAL rating 
is .88. For the ACCFBICAL rating the G coefficient is .89. 
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These G coefficients aay be considered the reliability of these 
two crirerion variables. 

Returning now to Table 16, the correlations between the 
criterion variables (EXPFBICAL and ACCFBIIIil) and the ESVTE 
Expression and Accuracy scores are consistently high. Of the 
eight correlations sho%m, the lowest is .88 and the highest is 
.93. The correlation between the ESVTE Expression score with the 
Expression criterion variable (EXPFBICAL) is .91 for Form 1 and 
.90 for Form 2. This is strong evidence of the validity of the 
ESVTE Expression score. Similarly, the correlation between the 
ESVTE Accur;icy score and the Accuracy criterion variable 
(ACCFBICAL) is high also: .92 for Form 1 and .91 for Form 2. 
This is strong evidence for the validity of the ESVTE Accuracy 
score." The fact that scores on the ESVTE correlate highly with 



"Although we chose to use the average of the four overall 
FBI/CAL translation ability level ratings here as a criterion 
variable, it is interesting to Consider the correlations between 
the ESVTE Expression and Accuracy scores on one form and the 
overall FBI/CAL translation ability level ratings assigned by the 
raters based on the examinee's performance on the other form. In 
this case, the other form is a totally independent criterion 
variable. That is, the rating is based on the examinee's 
performance on other translation tasks similar to those which the 
examinee would have to perform on the job. 

Here the validity coefficients are also quite good. The 
correlation between the ESVTE Expression total based on Form 1 
and the average of the two overall FBI/CAL translation skill 
level ratings assigned based on Form 2 Sentences and Paragraphs 
is .87. Similarly, the correlation between the Expression total 
based on Form 2 and the average of the two overall FBI/CAL 
translation skill level ratings assigned based on Form 1 
Sentences and Paragraphs is .90. 

The correla'j:ion between the ESVTE Accuracy total based on 
Form 1 and the average of the two overall FBI/CAL translation 
skill level ratings assigned based on Form 2 Sentences and 
Paragraphs is .91. Similarly, the correlation between the 
Accuracy total based on Form 2 and the average of the two overall 

94 



overall translation skill level ratings supports the validity of 
the two scores. 

7 •4* CoDvargant/DiscriBinaBt Validity 

Because the evidence in Table 16 so clearly supports the 
validity of the ESVTE as a measure of Spanish-English translation 
ability, a fuller discussion of evidence for the construct 
validity of the test is warranted. Such a discussion can be 
obtained by considering the convergent/discriminant nature of the 
correlations between the ESVTE and other measures that 
theoretically should or should not show a relationship to the 
construct of interest. In such a discussion, an expected 
correlation of the test with each variable is analyzed and 
discussed. Some criteria will be expected to show a strong 
relationship with the test whose validity is being examined, 
while other criteria will be expected to show a weak correlation, 
or to not correlate at all, or even to correlate negatively. We 
will make use of the convergent/discriminant validity approach 
here in order to fully e>">mine the construct validity of the 
ESVTE. 



FBI/CAL translation skill level ratings assigned based on Form 1 
Sentences and Paragraphs is .88. 

Again, it must be remembered that these overall FBI/CAL 
translation skill level ratings are less reliable than those 
included in table 4.7. The G study showed the G coefficient with 
one form and two ratings to be .84 for EXPFBICAL and .83 for 
ACCFBICAL. 
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In an effort to attain further understanding of the 
construct neasured by the ESVTE, two concurrent aeasures were 
collected. These concurrent neasures are described belov. 

Concurrent Measure s 

1- A self-rating fSPENSELF and ENSPSELFK CAL developed two 
questionnaires that asked subjects a) with what types of 
documents they had experience translating from Spanish into 
English and English into Spanish; and b) if they had 
experience, to rate their translation ability of these 
documents as either "Limited,* Functional,* "Competent, •* 
or "Superior." These questionnaires were administered to 
the subjects immediately preceding the administration of 
the first part of the corresponding test. A copy of these 
questionnaires is contained in Appendix N, Each subject's 
responses to these two questionnaires were converted into 
self-rating scores (Spanish into English » SPENSELF; 
English into Spanish « ENSPSELF) by first awarding points 
to each item that subject rated (1 for "Limited," 2 for 
"Functional," 3 for "Competent," 4 for "Superior," with N/A 
receiving no value) and then calculating the mean response 
to all items for which he or she provided a self-rating. 

In addition, data were collected, where available, on six 

nonconcurrent tests that had been administered within one to 

eight years of the study. 



Previously Administered Tests 

1. A Spanish OPI score fSPANSPK) . An oral proficiency 

interview (OPI) score for Spanish was collected for as 
many subjects as possible* Although this is not a 
wholly adequate criterion variable, it is relevant to 
translation ability. For adult second language 
learners, speaking proficiency assumes and is 
moderately correlated with Spanish reading 
proficiency. Correlations between the two skills 
typically are between .50 and .75. Thus, on a 
theoretical basis, it was decided that the OPI score 
could be used to provide additional evidence of 
criterion-related validity. For all ILR scores in 
this study, the following conversion was used for 
purposes of empirical analyses: 
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ILR Score Nunerical Score 



0+ 


0.8 


1 


1.0 


1+ 


1.8 


2 


2.0 


2+ 


2.8 


3 


3.0 


3+ 


3.8 


4 


4.0 


4+ 


4.8 


5 


5.0 



2. Other test scores . Other scores that measure possibly 
related constructs were collected as possible. None 
of these scores could be collected for all the 
subjects, however. These scores, the number of 
subjects for which they were collected, and their 
descriptive statistics are given belov, together with 
the sane information on all of the measures. 



Measure 


N 


Mean 


Std Dav 


MinittUJD 


Maximum 


EXPFBICAL 


42 


2.00 


0.84 


0.8 


4.5 


ACCFBICAL 


42 


2.29 


0.80 


0.8 


4.45 


SPENSELF 


39 


2.86 


0.65 


1.0 


4.0 


ENSPSELF 


35 


2.90 


0.62 


1.0 


4.0 


SPANSPK 


34 


4.03 


1.05 


2.0 


5.0 


DLPTLIST 


27 


52.70 


5. 15 


39.00 


60.00 


DLPTREAD 


27 


53.04 


6.57 


30.00 


60.00 


ENGSPK 


17 


4.21 


0.60 


3.0 


5.0 


SPENTRAN 


17 


3.45 


0.96 


2.0 


4.8 


ENSPTRAN 


17 


3.29 


0.65 


1.8 


4.0 



Key 

EXPFBICAL Overall composite ILR expression score. 

ACCFBICAL Overall composite ILR accuracy score. 

SPENSELF Average score on the Spanish into English Verbatim 

Translation Ability Self Assessment Questionnaire. 
ENSPSELF Average score on the English into Spanish Verbatim 

Translation Ability Self Assessment Questionnaire. 
SPANSPK An DPI score for Spanish. 

DLPTLIST The listening section of the Defense Language Institute 
Placement Test. Maximum possible score ^ 60. 

DLPTREAD The reading section of the Defense Language Institute 
Proficiency Test. Maximum possible score « 60. 

ENGSPK An OPI score for English. 

SPENTRAN An ILR score on the current FBI Spanish into English 
verbatim translation exam. 
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ENSPTRAN An ILR score on the current FBI English into Spanish 
verbatim translation exam. 

Relationships between scores on these measures and scores 
on the ESVTE were calpulated in order the examine the 
convergent/discriminant validity of the ESVTE. 
7.4.1. Convergent Validity 

Correlations between the Total Accuracy and Expression 
scores on each form of the ESVTE with the criterion measures are 
presented in Table 17 below. (Note that the ESVTE total score in 
this table represents a composite of the two ratings. In 
addition, examinees were not penalized if they did not attempt a 
paragraph due to lack of time.) The number of subjects involved 
in the correlation is also given, since not every subject had a 
score on every measure; i.e., the numbers in parentheses 
represent the number of subjects who had a score on both measures 
being correlated. The magnitude of the Ns should be considered 
in making interpretations. Larger Ns allow a great^er degree of 
confidence in the indicated relationship. In general, none of 
the Ns are large ^ suggesting that the correlations should not be 
considered stable. 
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tiblt 17 
Corrclatiom of th« ESVTC Scotm 
Mith Othtr AvaUablt Mmmtm 
iUimbtrt of M1r«d Scotm In Mr«nthtttt) 





SP£NS€LF 


ENSPS£Lf 




OimiST 


Olf>Tt£AD 






CMSPTKAN 








.W* 






.16 


•22 






(39) 


(35) 


(34) 


(27) 


(27) 


(17) 


(17) 


(17) 


EXP2 




.35* 




.65* 


.$«• 


.12 


.10 


•84* 




(39) 


(35) 


(34) 


(27) 


(27) 


(17) 


(17) 


(17) 


ACC1 


.59* 


.38* 


.66* 


.73* 




.06 


.04 • 


.80* 




(39) 


(35) 


(34) 


(27) 


(27) 


(17) 


(17) 


(17) 


ACC2 


.53* 


.29 


.59* 


.TO* 


.77* 


.19 


.19 


.75« 




(39) 


(35) 


(34) 


(27) 


(27) 


(17) 


(17) 


(17) 



• p < .05 



We will now discuss the relationships in table 17, 
referring again, when appropriate, to the data in table 16. The 
accuracy of this discussion is tempered by the fact that no 
reliability statistics are available on any of these criterion 
measures. Even though this is the case, since this is the only 
data available, there is no other option than to examine and 
interpret the suggested relationships. Since the magnitude of 
these relationships is attenuated to the extent that the tests 
are less than perfectly reliable, one can generally assume that 
the relationships are at least as strong as are indicated here. 
On the other hand, the reliability of the ESVTE score does not 
pose a problem, since the reliability of both ESVTE total scores 
is quite high. (See sections 6.2 and 6.3.) 

First, it is most notable that there were low to moderate 
correlations, most of them significant, between the ESVTE Total 
Accuracy and Expression scores and six of the eight criterion 
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variables. The correlations between the ESVTE Expression score 
and these six criterion variables were generally of about the 
same magnitude as the correlations for the Accuracy score, and, 
similarly, 23 out of 24 are significant. 

It is reasonable to expect the ESVTE to correlate 
significantly with English languag^a ability, which in this case 
was represented only by a measure of oral proficiency (ENGSPK) , 
given our discussion in the Introduction (section 1.5.3). One 
would postulate that examinees who are low in ENGSPK should do 
poorly in ESVTE Accuracy, since their lack of English ability 
would affect their ability to comprehend the texts to be 
translated on the Production section of the test. However, Table 
17 shows that the correlations with EHGSPK were low and 
nonsignificant. The descriptive statistics on the previously 
obtained measures discussed in section 7.3 reveal the explanation 
for this lack of expected correlation* The English language 
skills of the group were much more homogeneous than the Spanish 
language skills. For a subsample of 18 examinees for whom 
English OPI scores (ENGSPK) were available, the mean was 4.20, 
the standard deviation was 0.58, and the range was 3.0 to 5.0. 
Furthermore, it is likely that this subsample of 18 examinees 
exhibited greater variation in English language proficiency than 
the total sample of 42, since an English OPI would not normally 
be given to a Special Agent. Thus, if data were available on all 
members of the sample the true mean would probably be 
coi ^iderably higher (exhibiting a marked ceiling effect) and the 
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standard deviation would be even saaller. With very little 
variation in English ability in the sample, there was no 
opportunity for English to play a role in the scores* Thus, we 
see that for this sample as a whole, the source language, 
English, did not play a significant role in accounting for 
variation in test scores. 

It should be emphasized that in spite of the findings for 
this sample, both Accuracy and Expression need to be assessed on 
an English to Spanish translation test. At present, high English 
proficiency can not be assumed for all individuals in the 
examinee population, and it is likely that this situation will 
continue into the future. Indeed, in the future English 
proficiency will be even more varied, since the FBI is actively 
recruiting Hispanics and speakers of non-English languages to 
meet its need for personnel who can handle the growing amount of 
crime in non-Znglish languages. Since English proficiency can 
not be assumed, it will continue to be necessary to score for 
both Accuracy and Expression. However, should continued use of 
the ESVTE indicate a similarly high correlation between the two 
scores, then the FBI could probably rely solely on the Expression 
score, since this is the one that taps Spanish proficiency in the 
context of a translation most directly. This could occur if all 
applicants have high English proficiency, e.g., an ENGiPK score 
of 4 or above. Since the ESVTE requires only receptive skills in 
English, it does not put as heavy a demand on English skills as 
it does on Spanish skills. Thus, Spanish plays a greater role in 
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the Expression score than does English. English does play a role 
in the Accuracy score, but typically only when English skills are 
lacking, when an examinee has high English proficiency, as 
almost all of the examinees in the sample did, decoding the 
information in the source language text is not a problem. Under 
these circumstances, the problem for the examinee is encoding the 
text in Spanish, and it is here that proficiency is likely to 
vary significantly across individuals and thus play a determining 
role in the score. 

Accuracy and Expression are usually moderately 
interrelated. In the case of this sample, the correlation 
between the ESVTE Accuracy and Expression scores was .96 for Form 
1 and .90 for Form 2 (see Table 15). These high correlations 
between the two constructs are different from the more moderate 
correlations between these scores encountered in the Spanish - 
English Verbatim Translation Exam (SEVTE).'* They suggest that a 
single skill, critical to both the Accuracy and Expression 
scores, is tested by both ESVTE scores. According to the way we 
have defined the abilities that enter into the constructs, if 
this skill is not English language proficiency, then it would 
have to be Spanish language proficiency. This is quite feasible, 
since this population of examinees showed a healthy degree of 
variation in Spanish language proficiency (mean = 4.03, SD = 



The correlation between Accuracy and Expression on the 
SEVTE was .74 for Form 1 and .75 for Form 2 (see Stansfield et 
al. , 1990b) . 
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1.05, range « 2.0 to 5.0 on Spanish oral proficiency interview 
(SPANSPK)). It is this variation, then, that explains 
performance on both the Accuracy and Expression subscores for 
this sample. 

In the tables above, we would expect a positive 
correlation between the ESVTE Accuracy score and the English into 
Spanish self-assessment of this ability (ENSPSELF) . The ENSPSELF 
score is simply the mean self-rating assigned to items on the 
ENSPSELF questionnaire (Appendix N) . These correlations, 
depicted in the second column from the left of Table 17 above, 
are .38 for Form 1 and .29 for Form 2. (The latter correlation 
is not significant.) These modest correlations provide some 
initial support the validity of the ESVTE. The correlations 
between ENSPSELF and ESVTE Expression (.41 for Form 1 and .35 for 
Form 2) are similarly modest. Again, no data are available on 
the reliability of the ENSPSELF questionnaire." 



" The question of the reliability of the questionnaires used 
to calculate each subject's self-assessment score deserves some 
comment here. When dealing with the internal consistency 
reliability of a measurement instrument, the estimated 
reliability coefficient is an indication of the extent to which 
items comprising the measure are tapping into the same underlying 
trait or ability. This assumes that euch item was written to 
measure this trait or ability, and that all examinees would 
answer all items. 

The nature of the two questionnaires from which self- 
assessment scores were calculated here was somewhat different in 
that each subject gave a self-rating only to a subset of the 
"items." These ♦^items" were the document types with which he or 
she had experience. In the vast majority of cases, subjects did 
not have experience in translating all the document types; thus, 
self -rating scores were sometimes based on only 3 or 4 responses. 
The response on the other "items^* was ••Not Applicable, to which 
no reasonable numerical value could be assigned; ••Not Applicable" 
means that the subject does not translate such document types. 
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The correlations between the ESVTE and the self -rating of 
ability to translate each of the 10 types of documents included 
on the ENSPSELF questionnaire are found in Appendix Given the 
relatively small proportion of Language Specialists in the 
sample^ it is probable that the majority of examinees did not 
have Duch experience translating such documents on the job. An 
attempt was made to correct for this in the design of the 
questionnaire by telling people in the instructions, ••If you have 
never translated a particular type of document, please mark N/A 
(not applicable)." While almost all subjects completing the 
questionnaire (35) indicated that they translated correspondence 



When missing data occurs in a questionnaire database, 
there are several ways to deal with the problem under certain 
circumstances. Inadvertently missing data may be replaced by an 
estimate of that subject's response to the item, such as using 
his or her mean score on items answered or the mean response of 
all subjects answering that item. On certain measures, such as 
on an attitudinal questionnaire, a missing value may be 
appropriately interpreted as the subject's having no opinion or 
not caring about the issue in tho item, and a missing value can 
then be replaced by a neutral response. 

Had we been able to treat these responses as missing data, 
there would have been several ways to estimate the reliability of 
the two questionnaires. However, on the questionnaires used 
here, a response of "Not Applicable" is not missing data. To 
replace these responses with a numerical value (such as the 
subject's mean response) is contrary to the subject's own rating 
of "Not Applicable" to that "item" (document type). Furthermore, 
even if it were appropriate to treat the response as missing 
data, making a large number of replacements as would be required 
here, would inflate reliability by increasing inter item 
consistency in proportion to the number of responses of "Not 
Applicable" that were replaced by each subject's mean response* 
The resultant estimate of reliability would thus be spuriously 
high and it would not be interpretable. 
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(letters) (97%), the nean number of documents responded to of the 
10 document types was 6.43* While all document types received at 
least a 46% response, the average examinee responded N/A to more 
than a third of the document types. Thus, it may be inferred 
that translation of documents other than letters is performed 
rarely by most examinees and consequently that most examinees may 
have not have had a valid basis for making judgments of their 
ability. 

It is worthwhile to consider the correlations between 
ESVTE scores and the self-ratings of ability to translate the 10 
document types included on the English-Spanish Self -Assessment 
Questionnaire. Sixteen of the 20 correlations between the ESVTE 
Accuracy score for Forms 1 and 2 and the 10 document types were 
significant. Only .,he rating of the ability to translate 
technical documents from English to Spanish did not correlate 
significantly. The correlations ranged from .28 to .64. The 
highest correlations were with the ability to translate FBI forms 
(.56 and .64),'* depositions (.54 and .52), foreign counter- 
intelligence status/evaluation reports (.57 and .51), letters 
rogatory (.45 and .59), police reports (.45 and .59), foreign 
diplomatic reports (.56 and .47), FBI training manuals (.42 and 
.53) correspondence (.34 and .53). These correlations, 
individually and as a whole ^ provide evidence of the convergent 

^'The first co^'relation in parentheses is with the Accuracy 
score for Form 1 and the second is with the Accuracy score for 
Form 2. All of the correlations and the Ns on which they are 
bc>sed are available in Appendix N. 
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validity of the ESVTE Accuracy score. The fact that the 
correlations are bo similar for the two foms also bodes well for 
the comparability of the two fonas. That is to say, they appear 
to measure the same construct." 

Another overall measure of translation ability is the 
FBI^s current English to Spanish translation test (ENSPTRAN) (see 
column 8 in Table 17). The ESVTE Accuracy and Expression scores 
showed a high correlation with this test (.75 to .85). Although 
no evidence exists as to the reliability and validity of the 
ENSPTRAN, the high correlation found here supports the validity 
of both measures. 

Theoretically, the ability to translate from English to 
Spanish should require reading ability in the target language, 
which is Spanish. The measure of Spanish reading ability used 
here was the reading subtest of the DLPT. The ESVTE Accuracy 
score showed moderately high correlations (.65 and .77) with the 
DLPTREAD, which indicates that it is sensitive to Spanish reading 
proficiency. One would expect the ESVTE Expression score to be 
less related to Spanish reading ability than is ESVTE Accuracy, 
since the Expression score, strictly speaking, is supposed to 
refer to English writing ability in the context of a translation. 
The Expression correlations with DLPTREAD (.58 and .58) show that 
this was indeed the case. 



"The correlations between the 10 document types and the 
ESVTE Expression score were lower and only 3 of 20 were 
statistically significant. 
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Another measure of Spanish ability available was the 
Spanish OPI score (SPANSPK) . TO^re was a acxaerate correlation 
(.66 and .59) between SPANSPK and the ESVTE Accuracy, conf inning 
that Spanish language ability is related to the ability to 
translate information from English to Spanish. There was a 
similar correlation (.64 and .56) between SPANSPK and ESVTE 
Expression. This indicates that Spanish speaking ability is 
related to the ability to trans? ate an English language text 
using appropriate Spanish written expression. This is as 
expected, and supports the validity of each of the ESVTE scores 
as a measure of English to Spanish translation aJDility. 
7.4.2. Discrinia^Jit Validity 

Another criterion-related approach to establishing 
construct validity is to consider all the measures as a whole and 
contrast the correlations. First, one begins with the measures 
that would be expected to show a low correlation with the ESVTE. 
Then, one contrasts these measures with the correlations for the 
measures that would be expected to correlate more highly with the 
ESVTE. If the correlation with the variables expected to be more 
relevant is indeed greater, then this is evidence of discriminant 
validity. Thus, one examines the magnitudes, the differences, 
and the direction of the differences in the correlations, to see 
if they fulfill a priori expectations. This process establishes 
the discriminant validity of the test under consideration. Using 
this approach, the daua irom the validation study generally 
support the construct validity of the ESVTE as a test of English 
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to Spanish translation ability. 

Two contrastable aeasures are the FBI*b current 
translation tests (SPENTRAN and ENSPTRAN) . Oi^e would expect a 
stronger relationship between the ESVTE and the ENSPTRAN than 
between the ESVTE and the SPENTRAN, since both ESVTE and ENSPTRAN 
purport to measure the ability to translate in the same 
direction. Such an outcome was clearly found. For all four 
comparisons, the ENSPTRAN shewed a far stronger correlation (.75 
to .85 versus .04 to .22). Furtherrjore, none of the SPENTRAN 
correlations were significant. Again, one iiust remember that 
these current FBI tests are considered to have unknown validity. 
Nonetheless, the high co.rrelation between the ESVTE and the 
ENSPTRAN does provide evidence that both tests are measuring 
similar abilities. In contrast, the low, nonsignificant, 
correlation with SPENTRAN confirms the need to measure 
translation ability in each direction (see the conceptual 
discussion in section 1.5.3). 

Two other contrastable measures are the self assessment 
questionnaires (SPENSELF and ENSPSELF) completed by examinees 
prior to the exam. One would expect to find a stronger 
relationship between ESVTE scores and the ENSPSELF than between 
the ESVTE scores and the SPENSELF, since the ENSPSELF is a rating 
of ability to translate in the opposite direction. Columns one 
and two indicate that this did not turned out as expected. All 
four of the SPENSELF correlations are larger than the 
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corresponding ENSPSELF correlation." 

Another issue is the relative i»portance of the two 
languages to the two scores. One would expect the ESVTE 
Expression score to be »ore strongly related to Spanl *i 
proficiency than to English proficiency, since, on the ESVTE, the 
examinee actually performs in Spanish. The one measure of 
English proficiency available is ENGSPK and the three measures of 
Spanish proficiency available are SPANSPK, DLPTLIST, and 
DLPTREAD. The ESVTE E^rpression score shows n far greater 
correlation with SPANSPK (.64 and .56) than with ENGSPK (.16 and 
•12), which is a measure of the corresponding skill (speaking). 
ESVTE Expression also shows a higher correlation with DLPTREAD 
(Spanish reading) (.58 and .58) than with ENGSPK, which is also 
as one would expect. Similarly, the ESVTE Expression correlation 
with DLPTLIST (.72 and .65) far exceeds the correlation with 
ENGSPK. All these correlations suggest that Spanish language 
ability is strongly correlated to success on both ESVTE measures. 



"it is probable that this outcome was again due to the 
characteristics of the sample. Few members of the sample had 
the opportunity in their work to do many English to Spanish 
translations. This is verified by their responses to the 
statement discussed earlier on page 84, ••The material in the 
exams was representative of the types of written documents I 
encounter in my work." Only 37% of the examinees agreed with 
this statement in reference to the ESVTE, while 50% agreed in 
reference to the SEVTE (see Stansfield et al., 1990b). Still, 
all subjects completed both the ENSPSELF and the SPENSELF 
questionnaires. The greater validity coefficients for the 
SPENSELF are probably due in part to the fact that subjects were 
able to make more informed judgments in the SPENSELF than on the 
ENSPSELF. Since the ENSPSELF ratings were less valid, there was 
less opportunity for them to correlate with ESVTE scores. 
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while English language ability is not* They also suggest that 
azQong second language learners, Spanish listening, speaking, and 
reading abilivy is highly correlated with Spanish writing 
ability, which is a good part of what is measured by ESVTE 
Expression. On the other hand, for the same group, largely 
composed of educated native speakers of English, English speaking 
ability (ENGSPK) would not be expected to correlate with the 
ability to translate into Spanish, and indeed, it did not. 

Similarly, one would expect the ESVTE Accuracy score to be 
more strongly related to proficiency in English than is 
Expression." The data for the three measures of Spanish 
(SPANSPK, DLPTLIST, DLPTREAD) do not show this to be the case. 
In fact, neither ESVTE score correlates with English proficiency 
for this sample.*^ 



^ Accuracy requires the correct comprehension of the Spanish 
language propositions, whereas Expression does not. That is, one 
can score high on Expression and still not render an accurate 
translation, 

**^It is not possible to say which of the two ESVIT scores is 
more valid. The ESVTE Accuracy score seems to correlate slightly 
higher with the three Spanish language measures than does ESVTE 
Expression, which is not as one might expect. That is, we would 
expect target language proficiency to correlate more highly with 
the Expression score than with the Accuracy score* The mean of 
the six Accuracy correlations with the three Spanish language 
measures (see the lower half of columns three, four and five in 
Table 17) is .68, while the mean of the six Expression 
correlations is .62. This suggests that Accuracy may have 
slightly more validity as a measure of English to Spanish 
translation ability. On the other hand, for the two measures of 
English to Spanish translation ability (ENSPTRAN and ENSPSELF) 
the mean of the four correlations with the Expression score is 
.61, while the mean of the four correlations with the ESVTE 
Accuracy score is .56. This would suggest that the Expression 
score may have slightly more validity as a measure of English to 
Spanish translation ability. Given this difference in results, 
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similarly, since Accuracy, theoretically involves both 
languages about equally, one would expect fairly similar 
correlations between Accuracy on corresponding Measures of 
proficiency in both languages, A comparison of the correlations 
with oral proficiency in the two languages, which is the only 
measure for which corresponding scores are available in the two 
languages, shows that the correlations bei;:^een Accuracy and 
SPANSPK far exceed the correlation between Accuracy and ENGSPK. 
Thus, for this sample, Accuracy does not appear to be testing 
reading ability in English; rather, it is almost exclusively 
testing encoding ability in Spanish. 

Given the high correlations between both ESVTE scores with 
measures of Spanish language ability, and their absence of 
correlation with English language ability, it is plausible to 
hypothesize that the ESVTE is not a measure of translation 
ability at all, but merely a job-related test of Spanish language 
proficiency. The fact that the two scores were found to measure 
the same construct when they were postulated to measure different 
dimensions of translation ability lends additional credibility to 
this hypothesis. However, the hypothesis can be more directly 
addressed by comparing the magnitude of the ESVTE correlations 
with the standardized measures of Spanish ability and English to 
Spanish translation ability (ENSPTRAN) . In this case, the mean 



it is not possible to say which of the two ESVTE scores is more 
valid. Rather, it is only possible to say that they both appear 
to be valid. 
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of the four correlations (see table 17) with the FBI* 8 existing 
English to Spanish translation test is .81, while the nean of the 
12 correlations with the Spanish language measures is .65. This 
difference in the magnitude of the correlations supports the 
claim that the ESVTE is not nerely a neasure of Spanish language 
proficiency. Instead the ESVTE appears to be a neasure of 
English to Spanish translation 2ibility, but it is closely related 
to Spanish language ability, for a sample characterized by high 
and fairly homogeneous proficiency in English and varying 
proficiency in Spanish. 
Conclusions 

From this discussion of the validity of the ESVTE through 
the examination of the construct, criterion-related, convergent 
and discriminant relationships with other measures, four 
conclusions can be reached. 

First, ESVTE Accuracy and Expression measure the same 
construct, at least for a sample of examinees characterized by 
high proficiency in English and varying ability in Spanish. The 
two measures are highly correlated (.96 on Form 1 and .90 of Form 
2), suggesting that both scores provide the same information and 
that either score can serve as a substitute for the other. 

In spite of this conclusion, it would be inappropriate at 
this time to determine only a single score on the test. The 
theory of the dimensions of translation ability discussed in the 
introduction, and the results of research on the SEVTE suggest 
strongly that both scores may be necessary in order to fully 
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appreciate an individual's translation ability. If additional 
samples of ESVTE examinees show high English ability and varying 
Spanish ability, then it would be possible to conclude that such 
is the nature of the ESVTE examinee population. Only if the 
population can be shown to be similar to the sample that 
participated in this study could a single score serve adequately 
to measure translation ability. 

Second, both ESVTE Accuracy and ESVTE Expression appear to 
be valid measures. Both were found to correlate highly with 
translation skill levels assigned by comparing direct 
translations to the FBI/CAL translation skill level descriptions. 
ESVTE Accuracy and Expression scores were found to correlate with 
the FBI's current English to Spanish translation test, with self- 
ratings of ability to translate various kinds of English language 
documents on the job, and with scores on all Spanish language 
proficiency tests, including measures of listening, speaking, and 
reading. 

Third, neither score seems to be superior to the other for 
a sample with these characteristics. That is, both scores seem 
to correlate about equally with the criterion variables. These 
criterion variables include three standardized measures of 
Spanish language proficiency, an existing English to Spanish 
translation test, and self ratings of English to Spanish 
translation ability. 

Fourth, the language of the target document, Spanish, 
plays a major role in both the ESVTE Accuracy and Expression 
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scores. On the other hand, the language of the source document, 
English, appears to play almost no role in ESVTE scores,' at least 
for a sample of examinees characterized by high proficiency in 
English and varying ability in Spanish." 

These conclusions provide strong support for the validity 
of ESVTE scores as measures of overall English to Spanish 
translation ability. 



*^It is clear that for the sample that participated in the 
ESVTE validation study there was a "threshold effect" for English 
language proficiency. Under a threshold effect, once scores 
reach a certain level, the trait being measured ceases to play a 
major role in the prediction of the criterion variable. In this 
case, for examinees with high English proficiency, English 
proficiency ceases to be a predictor of English to Spanish 
translation ability. It is probable that the threshold of 
English proficiency is between 4.0 and 4.8 on the ILR scale. 
After one surpasses this threshold*, minor variations English 
proficiency no longer play an important role in ESVTE scores or 
even in English to Spanish translation ability. Thus, the fact 
that one has high English proficiency says very little about 
one's English to Spanish translation ability. However, for those 
individuals with low English proficiency, English proficiency (or 
the lack of it in this case) does play a significant role in 
ESVTE scores and one can assume that a person with low English 
proficiency will be deficient in English to Spanish translation 
ability. 
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8. construction of Translation Skill Lavsl Scors Convsrsion 
Tablss for ths B8VTB 

This section describes the construction of t&bles to 
convert raw scores on the ESVTE for Expression and Accuracy to 
FBI\CAL Translation Skill Levels (TSLs) . In order to »ake 
decisions on the basis of test scores, compare test scores across 
forms, and interpret test scores, raw scores on the ES\'TE must be 
converted to TSL scale scores. 
8.1 Overviav 

In most of the preceding discussion of the ESVTE, raw 
scores have been used. However, one of the goals of the project 
was to be able to interpret test scores in a way that is 
grounded in the Translation Skill Level Descriptions.*^ This 
entailed the construction of raw score-to-TSL score conversion 
tables for Expression and Accuracy for each section and each form 
of the test. These are presented in Appendix O. 

Construction of the scaled score conversion tables is an 
attempt to give interpretative meaning to the ESVTE raw scores. 
In addition, it enables the comparison of total scores across 
forms and, to an extent, across the Multiple Choice section on 
the two forms. Conversion into scaled scores takes into account 
differences in test difficulty. Thus, a comparison of results 
across test forms and subtests must only bp made in terms of the 



*^The Statement of Work in the RFP issued by the FBI for 
this project called for the development of a test ••which would 
ultimately result in a score which can be converted to the 0 
through 5 scale. 
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TSL scores. 

8*2 DAtArmlning Contributors to Snr999ioL uad i^ccuracy Total 
Scoras 

Given the format of the test and the scoring system, there 
was a total of 185 possible points on the test when all the 
subscores were added together. However, after the data was 
collected, it became apparent that there should be separate 
scores for Expression and Accuracy, (See the discussion of the 
history of the SLDs and the discussion of the constructs in 
sections 1.4 .l. and 1.5.3.) Based on our conceptualization of 
the constructs, it was clear that scores for paragraph expression 
(PEX) , paragraph grammar (PGR) and paragraph nechanics (PME) 
should contribute to the total Expression score, while sentence 
accuracy (SAC) and paragraph accuracy (PAC) should contribute to 
the total Accuracy score. To determine to which score the 
Multiple Choice (MC) section and the Word and Phrase Translation 
subsection belonged, a multiple regression •'r-square" analysis 
was performed. An r-square analysis determines the r-square 
value (percent of variance shared by the combination of the 
variables with the criterion) of all combinations of the 
variables entered into the equation when regressed on the 
criterion (overall EXPFBICAL and overall ACCFBICAL) . Both MC 
scores and Word and Phrase Translation scores were entered into 
the r-square analysis together with scores for Paragraph 
Expression, Paragraph Grammar and Paragraph Mechanics, using the 
overall FBI/CAL Expression score as a criterion. In addition, 
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both MC scores and Word and Phrase Translation scores were 
entered into the r-square analysis together with Sentence 
Accuracy and Paragraph Accuracy scores, using the overall FBI/CAL 
Accuracy score as a criterion. The results of all the r-square 
analyses (Expression and Accuracy scores for the two forms of the 
SEVTE and the two forms of the ESVTE) were examined together. 
Results indicated that, although MC and Word and Phrase 
Translation scores contributed to both Expression and Accuracy 
scores, the most parsimonious combination of scores was for MC to 
be used as a subscore for Expression and for the Word and Phrase 
Translation score to be used as a subscore for Accuracy. 

Once these combinations of subscores were detexrmined, we 
examined whether there was anything to be gained by 
differentially weighting the different subscores to produce the 
total score. Regressions were run to determine the maximum 
amount of variance shared between the optimal combination of 
subscores and the corresponding criterion variable. These were 
compared to forming total scores without differential weighting. 
This analysis revealed that little was to be gained by weighting 
for any of the ESVTE scores. 

8.3 Development of Raw Score to Scaled Score Conversion Tables 

Since one of the goals of the project was to provide 
translation ability scores based on the TSL descriptions, it was 
necessary to identify a procedure that would anchor ESVTE scores, 
which are analytical, to the holistic TSL descriptions. This was 
accomplished during the validation study (see section 7.2) by 
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having each rater assign to each paper, separately for Expression 

cftjjvi r»wwui.ciw.j , a va. aitoAci wxwii w<L xCx«lllC J Dae»eu wn VUf? 

FBI/CAL translation skill level descriptions. This procedure 
produced four holistic ratings for Accuracy and four holistic 
proficiency ratings for Expression* These two sets of four 
holistic proficiency ratings were then averaged separately to 
give each examinee an overall FBI/CAL TSL score for Expression 
and Accuracy. 

To develop a conversion table of raw ESVTE scor^^^s to TSL 
scores, total raw scores for Expression and Accuracy for all 
subjects were averaged between raters. These total raw scores 
were then regressed on the corresponding overall FBI\CAL 
translation skill level {Expression or Accuracy) . As shown in 
Table 15, correlations between the total ESVTE scores and these 
overall scores were very high: from .90 to .91 for Expression 
and from .91 to .92 for Accuracy. These high correlations 
produced optimal regression ecjuations for predicting TSL scores 
from raw scores on each form of the test. These equations were 
then used to produce predicted TSL scores from all possible ESVTE 
scores for each form." Thase conversion tables are presented in 

"For a considerable number of examinees on each form of the 
test, this regression line resulted in a perfect prediction. 
That is, the overall TSL rating predicted by applying the 
regression line to the raw score (or weighted score in the case 
of Form 2 Expression) coincided exactly with the average TSL 
rating assigned by the rater. However, there was a tendency 
toward greater error among examinees who scored higher on the 
ESVTE. This was due to a number of causes^ including the 
regression effect, sampling, and the speededness of the Paragraph 
Translation subsection during the validation study. For 
additional information on the ?\ccuracy of predictv^d Translation 
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Appendix 0. 

8*4 Using thm Multiple choice Section ms a ^Bcrmmrx^^ 

The Multiple Choice section of the ESVTE »ay be used to 
screen out individuals for whom the production section of the 
test is inappropriate. Section 2.4 of this report describes how 
it was determined to use the multiple choice section score as a 
screen. The Multiple Choice score selected (mentioned below) is 
the best predictor of a TSL rating of 2.0 on the combined 
multiple-choice and production sections of the ESVTE. Examinees 
who score below this level are unlikely to score a 2.8 (2+) or 
above on the total test after their raw score has been converted 
to the corresponding TSL score for Accuracy. The ESVTE total 
score corresponding to a TSL of 2+ is the recommended passing 
score; that is, minimum the score at which examinees can serve as 
translators for the FBI. 

In using the ESVTE MC as a screen, the most serious error 
one can make is to exclude someone from taking the Production 
section who may ultimately score a 2+ or above. Giving the 
Production section to someone who may not ultimately score 2+ or 
above is not a serious error, since this individual will 
ultimately be evaluated correctly (after the production section 
is scored) . To determine the cut--of f score on the Multiple 
Choice section, we need to determine the raw score on the 
Multiple Choice section that corresponds to a TSL score of 2; 



Skill Levels see CAL's memo to the FBI dated May 15, 1990, in 
Appendix xxx. 
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that is, we need to determine the raw score on the MC section 
that corresponds to a translation proficiency level of 2 for 
accuracy.** 

To determine the raw score on the MC section that 
corresponds to a score of 2, raw scores on the MC section were 
regressed on the overall Accuracy scores* (Note that for Form 1 
the correlation between these two scores was .81; for Form 2 it 
was .84. The root mean square error of the regression for Form 1 
was .456 of a level; for Form 2 it was •411.) This analysis 
revealed that the score of 33 would be the lowest predictor of a 
score in the 2 range on both forms. Examinees who score below 
this level on the Multiple Choice section of the ESVTE either 
need not take the production section, or if they already have, 
that section need not be scored. 

Using these cut-off scores would still leave in many 
examinees who may not ultimately achieve a score at or above 2+ 
in Accuracy on their total test; however, the probability of 



**There are a number of reasons for regressing the multiple 
choice section on the Accuracy total score. Accuracy is a more 
fundamental component of translation ability as indicated in 
sections 1.4 and 1.5. In addition, the purpose of a screening 
test is to predict performance on another test. In this case, 
the multiple choice section is the screening test and the other 
test is the production section, which requires the examinee to 
render translations directly and requires the rater to evaluate 
translations directly. Only part of the production section is 
scored for Expression, but all is scored for Accuracy. If the 
multiple choice section were regressed against the Expression 
part of the production section only, then the screening test 
would be correlated with only one of three parts in the 
production section. Thus, there would be less evidence of the 
validity of the screening test as a measure of translation 
ability* 
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excluding a candidate who Bi<^ht achieve a 2-*- in Accuracy on the 
total test is Kinisal. 
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APPENDIX A 



ADMINISTRATION INSTRUCTIONS FOR ESVTE 



TEST ADMINISTRATION INSTRUCTIONS 



ENGLISH INTO SPANISH VERBATIM TRANSLATION EXAM 




NOTE TO TEST ADMINISTRATOR 

This manual describes important inrormation about the 
procedures that must be followed BEFORE, DURING, and AFTER the 
administration of the translation exams* Uniform procedures art 
essential for the translation exams to yield reliable test results. The 
scores of all examinees from various field offices in the nation wll be 
comparable on]y if all test administrators follow the same procedures 
and give exactly the same instructions. It Is necessary, therefore, that 
you read the entire manual before administering the exams and follow 
the instructions without exception >%hen administering the exams* 
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GENERAL INFORMATION 



Test Security 

It is extremely important that the translation exams be safegijardcd and 
administered under secure conditions at each field office. In order to ensure test 
security, it is essential that you adhere to the following conditions: 

1. Keep all test materials either in your immediate physical possession or in a 
locked cabinet or other secure area under your control. 

2. Do not copy, or allow others to copy, any portion of the test booklets or tape, or 
make any notes or transcriptions of the test booklets or tape content. 

3. Allow only those particular individuals who arc to be tested to see the test 
materials, and only at the time of test administration and under the specific 
procedures described in this manual. 

4. Should any irregularities occur, report them on the Test Administrator Repor t 
Form included in the test package. Please complete and sign this form even if 
no irregularities occur. 



PRIOR TO THE TESTING DATE 

Assembling Test Materials 

Assemble as many test booklets and answer sheets as will be needed for the test 
administration, including tv o or three extra copies of each. You should also have on 
hand at least two no. 2 pencils (with erasers) for each examinee. Listed below are the 
materials needed for each exam: 

1) Multiple Choice Section test booklets 

2) Production Section test booklets 

3) Answer sheets 

4) No. 2 pencils 

5) A timer, wristwatch or other timepiece which can be reset 

Arranging for a Testing Site 

Locate a testing site that is comfortable and free from disiiaciion. The tesimg 
room should be large enough so that examinees can be seated with three feet of space 
in all directions between all examinees. 



ON THE TESTING DATE 



Equipment 

Check to make sure the timepiece is functioning properly and has been 
completely reset to zero (or 12:00). There should always be at least two timepieces in 
the testing room as a check against mistiming. 

Prohibited Materials 

While taking the Multiple Choice Section and the Translation of Words and 
Phrases in Context and Sentence Translation Section, examinees should not have 
anything on their desks except their pencils, test bookJets, and answer sheets. 
Examinees may use dictionaries only during the Paragraph Translation Section. 

Administering the Test 

Follow the procedures ^yelow when administering the test. All instructions within 
the boxes should be read ve rbatim . Pause where four dots appear to allow time for the 
pro, . Jure described to be carried out. Be sure you state the correct form where 
appropriate. Do not depart from these directions unless noted otherwise. 

1. After all examinees hiwe been seated, distribute the Multiple Choice Section test 
booklets, answer sheets, and pencils. 

2. Give the following instructions: 



Please do not open your test booklet In this section of the exam, you will 
mark all of your answers on the ansH^r sheet Do not nrite anylhing in the test 
booklet You must use a no. 2 pencil for marking your answers. 
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3. 



Instruct the examinees how to fill out the answer sheet: 



Place joujr unswmr shwt oa top of your Ufst booklet. Turn the amswer «beet so 
that yon see SIDE ONE in the upper right hand corner^ 

On the left half of iide one^ >ou wQl see an area cofltainiiif b)oe Unei. At the 
top of thii section It the word NAME. Print your name In ifae lK»ei provided. 
Print jrour last namei and then your first name* Leave a blank iqpace tetiveen 
your last name and your first name^ ' 

Now fill in the drdes beaeath the boxes in %diich you printed yoor name* 
Each circie yon fill in must correspond to the letter you printed in the box above. 
Be sure that you darken the circie so that the letter within the drcle is cmnpletely 
covered. You should not be able to see the letter. If you make a mistake, erase 
the mistake completefy« Do not make any extra marks on your answer sheet 
Your answer sheet will be scored by a machine. If you do not mark it careftiliy) it 
may not be processed accurately by the scoring machine. 

Now find the section labeled IDENTIFICATION NUMBER in the bottom left 
ha!f of your answer sheet Print your SOCIAL SECURITV NUMBER in the boxes 
labeled A through I.^ 

Now fill in the circles beneath the boxes in which you printed your social 
security number. Each circle you fifl in most correspond to the number you 
printed in the box above... 

Now find the section labeled SPECIAL CODES, located to the right of the 
section you just completed. {GIVE THE FOLLOWING INSTRUCTIONS IN 
ACCORDANCE WITH THE FORM NUNfBER OF THE EXAM YOU ARE NOW 
AD>nNISTERING:] Print the number [ONE or TWO] in box K. This Is 
[FORM 1 or FORM 2) of the English into Spanish Verbatim Translation exam. 
You do not need to fill in your birth date^ sex, or level of education..^ 

Now look at the right half of your answer sheet Notice that the first fifty 
items are arranged in colnmns in the top section of the answer sheet, while the 
next fifty items are arranged in the bottom section. Make sure you follow the 
order of the items as they are marked For example, after question number ten, 
you Hill need to return to the top of the section to mark your answer to question 
number eleven. 
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Are there way qiiestioiis?-Tky to answer every item^ bot do not be concerned if 
yw can not answer all €t tfaenu Yon will not jyntTfgfd tor foesiing* If 

«re wSsSure of tikt futSfVvT to s «juv«tioiij isfijife t?^ bc^t ^uvs* jOv can msA £o on lo 
the neiA question* The mbatim translation exam lahtt appnndmatcl^jr two hours 
and ten minutes to complete. -x"" - . -'i^i^^?^' - 



4. Instruct the examinees to begin the Multiple Choice Section: 

5. Walk about the room to make sure that everyone is marking their answers 
correctly on the answer sheet. 



Now remove from jrour desk evex3lhin{ except your test booklet^ answer sheet, 
pencils, and erasers^ 

Look at >t)ur test booklet for the Multiple Choice Section of the English into 
Spansh Verbatim Translation ExanL Print jour name in the space provided on 
the co^-en Print your last name firsts 

Print toda/s date in the space provided^ 

There are two parts In this section. You will be allowed a total of thirty^five 
minutes to complete both parts. I will ad\ise you when there are five minutes 
remaining. You may now open your test booklets crjcJ begin th^ test ISTART 
TIMER IMMEDIATELY! 



6. After 30 minutes , inform examines: 



There are five minutes remaining to complete this section* 
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7. After 35 minutes . STOP AND RESET THE TIMER. Inform examinees: 



Thti is the end of the Moltiple Choice SeettoB. Please stop w(»1dii( mow. 
Now look over jrour uswer sheet carefUIbr. Be sure all tibe muku joo made are 
dark and heavy. Insert jotur answer sheet ta jrour test booklet and doee dte 
booklet 



8. Collect the test booklets and answer sheets for the Multiple Choice Section. Be 
sure to account for all test booklets distributed. 

9. Distribute the Words and Phrases in Context and Sentence Section booUets. 
Instruct the examinees to begin this section: 



There are two parts in the oexf sectioiu Yoo may not use your dictioDaiy 
during this section. You will be given 35 mUg s gtes to oomplete the two parts in 
this section^ the Translation of Words and Phrases In Context and Sentence 
Translation* I vrill advise you wten there art five mfnntes remaining to finish this 
section. Yon maj now open your test booklets and begin iHt>rkiiig. (START 
TIMER IMMEDUTELY] 



10. After 30 minutes , inform examinees: 



There are five minutes remainiDg to complete this section* 



11. After 35 minutes . STOP AND RESET THE TIMER. Inform examinees: 



Please stop working now. We wiD now bave a short rest break We will begin 
the Paragraph Translation Section in five minutes. You may leave the room if 
yon wish. 
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12. Collect the test booklets for the Words and Phrases in Context and Sentence 
Section. Be sure to account for all test booklets distributed. 



13. Distribute the Paragraph Translation Section booklets* Instruct the examinees to 
begin the Paragraph Translation Section: 



y/t will now begin the Paragraph Thmslatfoa Secdon. in fbls m^I^ you will 
traoslatt tbrte {Mrajraphs. You mgy use dicttoiurief duriof fhfi part oC the 
exanu You will have 48 minotei to complete the Fan{raph IVanilatira Sectira. 
I will inrorm you when there are five ininutei remaining, IVbeo yon have finished 
this section, please close your test bookleU and wait for fUrther instroctlons* You 
may now begin. {START TIMER IMMEDIATELY] 



14. After 43 minutes, inform examinees: 



There are five minutes remaining. 



15. After 5 minutes , inform examinees: 



Please stop working now. Qose your test booklets. 



16. Collect the test booklets for the Paragraph Translation Section. 



Test Administrator Report Form 



ENGLISH INTO SPANISH VERBATIM TRANSLATION EXAM 



This form is to be used to report any irregularities in test administration. . Please fill it out 
(even if there were no irregularities), sign your name, and return it with the test materials. 
Thank you. 



Test Security 

By agreeing to serve as the test administrator, I am responsible for ensuring the security of 
the test. I have kept the test materials confidential and secure at all times. Non't of the test 
booklets or test tapes has been reproduced in any form. 

Irregularities: 



Test Administration 



The tests were administered in exact accordance with the procedures described in the 
Administration Manual. Any deviations from the stated procedures are listed below: 



Inegularities: 



Condition of Test Materials 

Before returning the test materials, I have checked the condition of the test booklets and 
test tapes. All materials are being returned in their original condition. 



Irregularities: 



(Please print name) Field Office 

Signature ^^^^ 
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NAME 



Last Pint 
DATE 



ENGLISH INTO SPANISH VERBATIM TRANSLATION EXAM 

MULTIPLE CHOICE SECTION 
FORM 1 




This test Is for ofTicial use onh-, <Jo not divulge an; inrorraation containeti herein. 
Do not duplicate any portion of this test- Do not show to unauthorized persons. 



FIELD OFTICE 



TEST NO. 



ENGLISH INTO SPANISH VERBATIM TRANSLATION EXAM (ESVTE) 



MULTIPLE CHOICE SECTION: INSTRUCTIONS AND EXAMPLE ITEMS 
EN^ OEDDED PHRASE ITFMS 



Instructions : Choose the best translation for the underlined portions of the following 
sentences. If there is more than one possible answer, choose the most appropriate 
translation. Consider how the entire sentence should be translated when choosing the 
correct answer. On your answer sheet, find the number of the question and blacken 
the space that corresponds to the letter of the answer you have chosen. 

Example : The children are playing in the snow . 



Discussion : Nieve is the correct translation of snow; therefore, the answer is (B). 
ERROR DETECTION ITEMS 

Instructions : Blacken the space corresponding to the letter of the incorrect part of the 
sentence on your answer sheet. If there is no error, choose (D). There cannot be 
more than one error in each sentence. Possible errors include: incorrect grammar, 
word order, vocabulary, punctuation or spelling. 

Example : El gato de mi vecino esta bianco; el mio es negro. 



The correct choice is (C). Es should be used in this sentence instead of esta 
because the adjective bianco refers to a characteristic rather than a temporary state ol 
the cat. The second portion of the sentence, el mlo es neg ro, uses the correct verb. 



(A) 
(B) 
(C) 
(D) 



nube 
nieve 
lluvia 
sol 



No error D 
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NAME 



hut First 
DATE 



ENGLISH INTO SPANISH VERBATIM TRANSLATION EXAA' 

PRODUCTION SECTION 
FORM 1 




This test Is for ofTicial use only, do not divulge any infonnation contsined herein. 
Do not duplicate an^ portion of this test. Do not shcm to unauthorized persons. 



FIELD OFnCE 



TEST NO. 
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ENGLISH INTO SPANISH VERBATIM TRANSLATION EXAM (ESVTE) 
PRODUCTION SECTION: INSTRUCTIONS AND EXAMPLE ITEMS 



EMBEDDED PHRASES 

Instructions : After you have read each of the following sentences, translate the 
underlined portion into Spanish. Consider how the entire sentence should be translated 
before providing your answer. Use the space below each sentence. 

Example : He sent several books to me. 

EI me mand6 

Discussion : The subject pronoun e] is retained in the translation to avoid ambiguity 
although it is not generally required in Spanish. The indirect pronoun me is included in 
the translation even though it is not underlined in the original sentence because if the 
entire sentence were to be translated, it would be placed in front of the verb (i.e., El 
me mand6 varios libros). 



SENTENCES 

Instructions : After you have read the following sentences, translate them into Spanish 
Use the spaces provided. Make sure your rendition sounds natural in Spanish while 
retaining the original meaning. 

Example : He didn't realize they already knew each other. 

El no se di6 cuenta que ya se conocian. 

Discussion : The subject pronoun he has been retained in the translation to avoid 
ambiguity although it is not generally required in Spanish. The verb realize has been 
translated by the idiomatic expression darse cuenta . rather than feaiizar, a false cogn.itc 
(a word which looks like the English word but means something different in Spanish) 
That is omitted in English but ^ue is required in Spanish. Both darse cuenta and 
conocerse are reflexive verbs in Spanish. Note also that the subject pronoun they is ni>t 
necessary in Spanish. 
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CONTENT ANALYSIS OF ESVTE MULTIPLE CHOICE SECTIONS 
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The results of a content analyses of the ESVTE exam forms 
are summarized below. (Note that although most test items assess 
only knowledge of grammar or vocabulary, a few assess both.) 

English in to Spanish Verbatim Translation JC^iUB 



Items/Form 1 Items/Form 2 

Graminar 

ser vs. estar 2 2 

verb frrm 3 5 

preterit vs. imperfect 1 2 

use of pronouns 6 5 

use of subjunctive 3 3 

use of preposition 3 4 

subject/verb agreement 2 2 

verb tense 2 2 

word order 1 1 

gender 1 1 

use of negative 1 1 

adjective form 1 0 

Total 26 28 

Vocabulary 

adjectival phrase 5 4 

adverbial phrase 4 4 

noun phrase 11 9 

verb phrase 11 14 

proverb 0 1 

Total 31 32 

Punctuation 1 1 

Spelling 2 2 

No error 5 5 
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CONTENT ANALYSIS 

ENGLISH-SPANISH (EXAM I) 

1. a. vocabulary - adjective 
b. grammar - ser vs. estar 

2. vocabulary - noun phrase 

3. vocabulary - false cognate (adjective) 

4. a. vocabulary - vierb 

b. grammar - verb form (present vs. present progressive) 

5. vocabulary - adverbial phrase 

6. grammar - verb form (infinitive vs. gerund) 

7. vocabulary - adverb 

8. a. vocabulary - verb 

b. grammar - use of pronoun (indirect vs, direct object) 

9. a. vocabulary - verb 

b. grammar - use of preterit vs. imperfect 

10. grammar 

11. vocabulary - adverbial phrase 

12. vocabulary - adjective 

13. vocabulary - noun 

14 . vocabulary - verb phrase 

15. vocabulary verb phrase 

16. a. vocabulary - verb phrase 

b. grammar - use of pronoun (reflexive) 

17. grammar - verb form (infinitive vs. present participle) 
grammar - use of pronoun (reflexive) 

18. vocabulary - noun phrase 

19. vocabulary - adjectival phrase 

20. vocabulary - verb 

21. vocabulary - noun 

22. vocabulary - noun 

23. vocabulary - noun 

24. grammar - use of subjunctive 

25. vocabulary - verb phrase 

26. grammar - use of subjunctive 

27. grammar - use of prepositions 

28. vocabulary - adjective 

29. vocabulary - verb phrase 

30. vocabulary - noun 

31. vocabulary - noun 

32. vocabulary - noun 

33. vocabulary - adverbial phrase 

34. vocabulary - verb phrase 

35. vocabulary - verb 

36. punctuation - comma 

37. grammar - subject-verb agreement 

38. grammar - use of preposition (por vs. para) 

39. grammar - verb form 

40. grammar - verb tense 

41. grammar - use of subjunctive 



1 
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42. 
43, 
44. 



45, 
46. 
47, 
48. 
49, 
50, 
51. 
52. 
53. 
54- 
55, 
56. 
57. 
58. 
59. 
60. 



granunar - use of preposition 

grammar - subject-verb agreement 

a. grammar - use of pronoun (••^ste" as pronoun vs. 

adjective) 

b» spelling - accent 

grammar - use of pronoun (reflexive vs. objective) 

grammar - word order (noun/adjective) 

grammar - use of pronoun (objective) 

grammar - gender (noun) 

grammar - use of negatives (conjunction) 

No error 

grammar - verb cense sequencing 
No error 

grammar - adjective form 
spelling 
No error 
No error 

grammar - ser vs. estar 
vocabulary - noun (gender) 
vocabulary - false cognate (noun) 
No error 



GRAMMAR is tested: 
ser vs. estar: 
verb form: 

preterit vs. imperfect: 
use of pronouns: 
use of subjunctive: 
use of preposition: 
subject/verb agreement: 
verb tense: 
word order: 
gender: 

use of negatives: 
adjective form: 

VOCABULARY is tested: 

adjective or adjectival phrase; 

adverb or adverbial phra.oe: 

noun or noun phrase: 

verb or verb phrase: 
*FC = False Cognate 

PUNCTUATION is tested: 
SPELLING is tested: 



26 times 

2 times 
times 
time 
times 
times 
times 
times 
times 
time 
time 
time 
time 



3 
1 
6 
3 
3 
2 
2 
1 
1 
1 
1 



31 times 

5 times (1 FC*) 

4 times 
11 times (1 FC*) 
11 times 



1 time 

2 times 



NO ERROR appears: 



5 times 
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CONTENT ANALYv^IS 



£«nv9i^lDn-or/%nxon (j:«A/iri x±) 

1. a. grammar - ser vs, estar 

vocabulary - adjective 

2. vocabulary - noun phrase 

3. vocabulary - false cognate (noun) 

4. a. vocabulary - verb 

b. grammar - verb form - (present vs. present progressive) 

5. vocabulary • adverb 

6. a. vocabulary - verb 

b. grammar • use of preposition 

7. vocabulary - adverbial phrase 

8. a. vocabulary - verb 

b. grammar - use of pronoun (direct vs. indirect object) 

9. a. vocabulary - verb 

b. grammar - preterit vs. imperfect 

10. grammar - various aspects of verb usage 

11. vocabulary - adverbial phrase 

12. vocabulary - adjective phrase 

13. vocabulary - noun 

14. vocabulary - verb phrase 

15. vocabulary - verb phrase 

16. vocabulary - verb phrase 

17. a. grammar - verb form (infinitive vs. present participle) 
b, grammar - use of pronoun (reflexive) 

18. vocabulary - noun phrase 

19. vocabulary - adverbial phrase 

20. vocabulary - verb 

21. vocabulary - noun 

22. vocabulary - verb phrase 

23. vocabulary - verb 

24. grammar - use of subjunctive 

25. vocabulary - verb phrase 

26. grammar - use of subjunctive 

27. grammar - use of prepositions 

28. vocabulary - verb phrase 

29. vocabulary - noun phrase 

30. vocabulary - noun 

31. vocabulary - adjective 

32. vocabulary - noun phrase 

33. vocabulary - proverb 

34. vocabulary - verb phrase 

35. vocabulary verb 

36. punctuation - comma 

37. grammar - subject-verb agreement 

38. grammar - use of preposition (por vs. para) 

39. grammar - verb form 

40. grammar - verb tense 

41. grammar - use of subjunctive 

42. grammar - use of preposition 

3 
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43. grammar * subject-verb agreement 

44. a. grammar - use of pronoun ("«ise" as pronoun vs. adjective) 
b. spelling - accent 

45. grammar - use of pronoun (direct vs. indixect object) 

46. grammar - word order • noun/adjective 
4V. grammar - use of pronoun (objective) 

48. grammar - gender (determiner) 

49. grammar - use of negatives (conjunction) 

50. No error 

51. grammar - verb tense sequencing 

52. No error 

53. grammar - verb form 

54. spelling 

55. No error 

56. No error 

57. a. grammar - ser vs. estar 

b. grammar - preterit vs. imperfect 

58. vocabulary - false cognate (adjective) 

59. vocabulary - false cognate (noun) 

60. No error 



GRAMMAR is tested: 


28 


times 






ser vs. estar: 


2 


times 






verb form: 


5 


times 






preterit vs. imperfect: 


2 


times 






use of pronouns: 


5 


times 






use of subjunctive: 


3 


times 






use of preposition: 


4 


times 






subject/verb agreement: 


2 


times 






verb tense: 


2 


times 






word order: 


1 


time 






gender: 


1 


time 






use of negatives: 


1 


time 






VOCABULARY is tested: 


32 


times 






adjective or adjectival phrase: 


4 


times 


(1 


FC) 


adverb or adverbial phrase: 


4 


times 


noun or noun phrase: 


9 


times 


(2 


FC) 


verb or verb p!^rase: 


14 


times 


proverb: 


1 


time 






PUNCTUATION is tested: 


1 


time 






SPELLING is tested: 


2 


times 







The number of grammar/spelling errors reflects the fact that 
number 54 is not resolved. 



NO ERROR appears: 5 times 
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FINAL VERSION 



SEN 1 tNCb AuCukAl'Y SCOkINu uUiDEUNES 



0 Translation is less than 50% complete, 

1 Many mistranslations, omissions, and/or inappropriate additions, so that much of the 
meaning is lost. 

2 Mistranslation or omission of one or more key terms (including verb tense), and/or 
inappropriate additions. 

3 Mistranslation or omission of one or more minor terras; no inappropriate additions. 

4 No mistranslations or omissions, although some nuance may not be conveyed. 

5 All nuances conveyed. 



ER^ 



J ^18 



APPENDIX F 



PARAGRAPH SCORING GUIDEUNES 



ESVTE PARAGRAPH SCX)RING GUIDELINES 



gRAMMAR* (Siructu)c and Morphology) 

0 (Translation less than 50% complete.) 

1 Majority of structures are incorrect. 

2 Some errors in basic structures and numerous errors in complex structures. 

3 Errors in basic structures are rare. Sporadic errors in high frequency complex structures- 
some errors in low frequency complex structures. 

4 No more than one error in a complex structure. 

5 No grammar errors. 

EXPRE,S$IQ|Sj (Word Order, Vocabulary, Idiomaticity, Style, ano Tone) 

0 (Translation less than 50% complete.) 

1 Expression generally equivalent to source language; unacceptable in target language. 

2 Expression closer to source language; generally unacceptable in target language. 

3 Expression usually follows target language conventions, but is not always preferred. 

4 Expression occasionally reveals translation. Appropriate register. 

5 No evidence of translation. 



MECHANICS (Spelling, Accents, Punctuation, and Capitalization) 

0 (Translation less than 50% complete.) 

1 Numerous errors in spelling or punctuation. 

2 Frequent cnors in spelling or punctuation. 

3 Occasional enors in spelling or punctuation. 

4 Rarely makes errors in spelling or punctuation. 

5 AJmost no enors in spelling or punctuation. 

ACCURACY 

C ''^^^^nslation less than 50% complete or accurate.) 

1 Ma, ^ mistranslations, omissions, and/or inappropriate additions, so thai much of the 
meaning is lost. 

2 MiSiranslation or omission of one or more key terms (u*w^udtng verb tense"* and 'o: 
inappropriate additions. 

3 Mistranslation or omission of one or more minor terms; nu »*iappropriate addinons 

4 No mistranslations or omissions, although some nuance may not be conve\eJ 

5 All nuances conveyed. 

•Use the information on the following pages as a guide in distinguishing enors m bjSi.. hi^h 
frequency complex, and low frequency complex structures 













Source: KTS Oral Proficiency Testing Manual. 1982. 
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Princeton, NJ: educational Testing Service, pp. 










LS GRAMMAR GRID - SPANISH 


• 


t 

9 

■f 
f 

.i 


LEVEL 


VERBS 


NOUNS AflirPTil/rC AfiurDOC Aun iima^ 


WORD ORDER 


1 1 

OTHER 1 


Of 


PRESENT IND.t "sr" varh. 

Ut person singular. 

Inflnitiv* foFM atfi to b« axpact^ 

ed. 


Soma arciclat indicating concapt of 

gandar k nu»bar. 
ADJ.i Vary connon onat. 
ADV.t hoy, natfana. aquf, allf. 

QUESTION WORDS: d^nAm^ por 


Vary basic nord 

ordar, 
So@a varblaas 

•antancaa ara to 

ba axpactad. 


Abla to anavar vary 

•ispla yaa/oo 

quaationa. 
Abla to MM aoaia 

objacta^ colora^ 

daya of tha waak» 


0 

ID 




cu/nto, qu^. 
NEGATIONS no hablo, •rr.. 








■otttha. 

Could ba axpactad ta 

tall tlM (axcapt 

1/2 k 1/4). 
NuiAara 1 to 20. 
NaMa of 1— adiat© 

fasily flM^ara. 
Liaitad 4 iaolatad 

Mcabulary. 




PKESENT IND.t ReauUr v«rh« 


UK ■graaaancs gandar, 
• •uwjwci-v«rDt aj enough Mny 
Miatakea ara to ba axpr^ctad. 

ARTICLES 1 


Position of «ott 
coflHon adjactivaat 
la casa granda 

al libra aiul 






Radical changing varbti 

tanar» podar» quarar» cottar 
Saf lexivatt 

llaaaraa 

Irragularat 

ponar, ir, habar (hay), 


Graatioga. 

Tall tiM (coiiplata). 
Vaathar. 

Ordar a aaal (aiia^laX 

Maka alapla fMrchaaa% 

lUndla aiapla 
traoaactioaa at tha 
^at offica» bank» 
drugatora, ate. 

Can cowt up to 1000. 


0 


DafinitaT al» la» loa. U% 
i.iiuvt ^nicvi un^ una^ unos, unas 
(•OM concap^ of thair uMge). 
CONTRACTIONS !al. d«l 




aabar, hacar (vaathar)t 


ADJECTIVES: 






••ar 
*aatar 

*maiiy aintakaa are to ba axpacted 

NEAR nmmEl ir ♦ • 4 InflnlMw-. 


Poaaaotivas l^t paraon (mi. «is). 

2nd pardon formal (au.aua) 
Qualifying: »ott copmon onea. 

ADJ. 4 ADV. OF QUANTmi ■tirhn, jvnrn^ 








b«ttanta» denaalado. 

IDIOMATIC EXPRESSIONS: hacer (nmMthmr) 
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LEVEL 


VERBS 


NOUNS, ADJECTIVES, ADVERBS, AND iDlOMS 


WORD ORDER 


OTHER 


1+ 

0 
v> 


IMD.l 

Prcsantt wldar rant* of 
Irregular varba. 
Basic reflaxlva 
varba • 

Baalc knowladga of tha 
diffarancaa batvaan aar 4 
aatar : 

SER: Phyalcal daacrlptloi^ 

national lty» profaaaion. 
ESTARi location, tanporary 

haalth condition. 
Frataritai aoM knovladta. 

mainly lat 4 3rd 
pardon aingullr* 


PRONOUNS 1 Direct 4/or Indirect 

(but not combined) • 
ADJ> i Demonetretive. 

Poeeeeeive, 
IDIOMATIC EXPRESSIONS! some with 

tcner (hembre, friOt etc.) 

tener que. 


Correct vork order 
for! Adv, (moet 
common onee). 


Some eutobiotrephic 

information. 
Daily /otttine. 
Simple deecriptioQ 4 

oerretion. 
Aetivitiee. 


7 

0 

€1 


IWD,i 

Praaanti regular 4 irratuUr 
verba* 

raflaxiva verba, 

SABER va OOHOCER 
Peats inparfect 4 preterite 
(•o»e knowledge ebout 
the difference between 
the two). Kany nietekee 
ere to be expected 
SUBJimCTIVEi 

Preeentt in indirect coMende, 
CONDITIOHALl 
SlJiple 
IMPERATIVE 


ADJ.i Coaperetive 4 euperletive 

NOUNSi Coeiperetive 

PRONOUNS 1 relet ive. interroeetive. 

prepoeitionel, direct 4 indirect 

(double object pronoune). 
PREPOSITIONS! moet (por 4 pere 

limited). 
Negativee 4 their ef f imativee! 

nada, nadie, etc.. 


Correct wrd orders 
ell pronoune* 
Poeition of edj. 
when change of 
meening occure! 
Be un hombre 
pobre. (poor) 
Ee un pobre 
hombtre. 
(unfortunate) 


Good eutobiotrephic 

infofMtion. 
Good deecriptiom of 

daily routimo. 
Some fair deecription 

4 aerretioQ. 
Heeiteot mt timee 4 

groping for worde. 
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LEVEl 



2+ 

X O 









«: 
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o 




o 


X 






x: 






VERBS 



im .t 

Fref rif vt Iwptrftct 
(good coMund 60Z of ch« 

Future: tlaplt 
FKESEWT PROGRESSIVE 



FAST FROCRESSIVE 



SUBJUNCTIVE: 



FrMmt to •xprttt) 
hop#9 •M>tiont» 
unctrttinty, doubt, 
%rlth ntMtiVtt •nt«c«dtnt 
SEI ¥■ ESTAI t (good comnd 60Z 

of tht tlM), 
*Thc uFic of j>iistnr 



INDs 

Frcf rif v» Iiq)Tftct 
(good control 70Z of tht 

tlM). 

Futurt of probtbility 
(present )• 

All coapound tmstt. 
<X)KD1T10HAH Si«pl« 

Coapound 
SUUUHCTTVE t 
Frtstnt 80Z 
FrMcnt ptrf«ct 
Xapcrfcct 50Z 
Pluperfect 
Subjunctive used with laptrtontl 
k tdJectivAl phrttet. 
CowpuUory usiiga with verbs & 

conjunctions. 
Contrsry to fsct (sinpU tsnsss). 
SE R vs ESTAR i (good control 90Z of 
ths tlM). 
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NOUNS, ADJECTIVES, ADVERBS, AND IDIOMS 



ADJ >t Fosssssivs 

DMonstrstivss 
PREF.t Rsther good control of £or 
4 psrs, 

IDIOKATIC EXPRESSIONS t scsbsr ds 
•1 infinitive 
hacs 4* period of tise > 

preterite (ego). 
i iee f e ^ pe e lsd e l >i«s I 



ADV^t ye, todevfe, tun (correct 
usege). 

FROHOUWSt Reflexive with 10 to 



9%preis an involuntctry 
tmexpected ection. 
Keciprocel reflexiveet 

Noe escribiwe frecuentsMnte. 
SoM knowledge of: the inpersonel 

se, Se instead of ths ths *true* 

pessivs, 

IDTOtlATTC rXPRFSSIONR: 



hacia + period of time + 
Imperfect 



WORD ORDER 



Corrsct vord ordsr of 

•11 pronouns 4 

sdvsrbs liks je, 

todsvfs i. sun* 
Posit ioa of sdj. vhso 

Changs of SMsning 

occurs # 



Vsry corrsct vord 
ordsr t#ith accurate 
plecaMnt of the 

prof^oune (^ifM^lm U 
double). 



OTHER 



Gco4 dsacrlptioa 4 
narratloa. 

Diacueeloa of currsot 
•veote. 

SoM eupportod 
oplttloo. 



toM cosiples 

doecrlptiOM 4 

McretloM* 
Able to oaprorjn 4 

defend en opinion 

on e controvore'inl 

eubjoct vith 

porsono vho do not 

ogroo. 
OccasionnI haaitn* 

tioo in epaaking. 
Able to rephrase. 
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LEVEL 


VERES 


NOUNS, ADJECTIVES. ADVERBS, AND IDIOMS 


WORD ORDER 


OTHER 


-J 


IMDi 

Putura of probability (pact) 
using "futuro antarior** 
(futura parfect). 
SUBJUNCTIVEt 

Pluparfactt forma 4 uaaga 
in at^uanca of tansaa. 
**If** clauaaa (contrary to fact 
coapound tanaaa). 


PKEP.t corract uaaga of moat cownon 
onaa: para, poi, en» a, da, 
acarca da, con. 
Good knovladga oft tha Inparaonal aa. 
Tha uaa of aa to axpraaa tha 
paaaiva voica. 




Abla to anaifar 
cooplax k 
hypothatical 
^uaationa. 

Hardly any liaaltatiom. 




Varba of **davanir** (diffarant 
waya of axpraaaing tha varb 
to bacoM in 9paniah){ 
hacaraat ponaraa» volvarae. 


Hoat fraquant idiomatic axpraaaiona 

(good control). 

4 

Som« laaa fraquant idiomatic 
axpraaaiona. 




Estanaiva vocaby&afy 

on a vida variaty 

of aubjacta. 
Abla to avitch (ram 

abatract to almpla 

aubjacta. 

Abla to uaa diffaramK 
ragiatara. 




SaM as 


SaM aa '*4*'. 




Haa naarly parfoet 
gramMr« oxtanaiva 

vocabulary. 
Abla to uad vary 

id£omaftU laoiguai^. 
Abla to tailor bia 

apoach to hia 

aydiamca* 

Maar porfact comMod 
of aocial ragiatara. 


5 


Porfoma lika an educated ndtiva in all vaye. 
1-7 


Should ba abla to diacuaa 
idea lika a nativa: flua 

Should ba abla to undarat 
colioqualiaM* 


any topic or 
ntly ft accurataly 
and all natlva 




APPPMHIY n 



PILOT VERSION OF SENTENCE SCORING GRID 



SENTENCE SCORING GRID 



GRAMM AR 

0 Less than 50% complete. 

1 One or more errors in basic structures. 

2 One or more errors in high frequency complex structures. 

3 One or more enors in low frequency complex structures. 

4 One error in a very low frequency complex structure. 

5 No errors. 

EXPRESSION 

0 Less than 50% complete. 

1 Expression generally equivalent to source language; unacceptable in target language. 

2 Expression closer to source language; generally unacceptable in target language. 

3 Expression follows target language conventions, but is not preferred. 

4 Expression gives subtle indication of tramlation. Appropriate register. 

5 No evidence of translation. 

MECHANICS 

0 Less than 50% complete 

1 Four errors 

2 Three errors 

3 Two errors 

4 One error 

5 No error 

ACCURACY 

0 Less than 50% complete. 

1 Many mistranslations, omissions, and/or inappropriate additions. 

2 Mistranslation or omission of one or more key terms (including verb tense), and/or 
inappropriate additions. 

3 Mistranslation or omission of one or more minor terms; no inappropriate additions. 

4 No mistranslations or omissions, although some nuance may not be conveyed. 

5 All nuances conveyed. 
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APPENDIX H 
PILOT VERSION OF PARAGRAPH SCORING GRID 
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PARAGRAPH SCORING GRID (ENGLISH INTO SPANISH) 



GRAMMAR S 

0 Less than 50% complete. 

1 Majority of structures are incorrect. 

2 Some errors in basic structures and numerous errors in complex structures. 

3 Errors in basic structures are rare. Sporadic errors In high frequency complex structures; 
some errors in low frequency complex structures. 

4 No more than one error in a low frequency complex structure. 

5 No grammar errors. 

EXPRESSION 

0 Less than 50% complete. 

1 Expression generally equivalent to source language; unacceptable in target language. 

2 Expression closer to source language; generally unacceptable in target language. 

3 Expression usually follows target language conventions, but is not always preferred. 

4 Expression occasionally reveals translation. Appropriate register. 

5 No evidence of translation. 

MECHANICS 

0 Less than 50% complete 

1 At least 50% correct 

2 At least 70% correct 

3 At least 80% correct 

4 At least 90% correct 

5 At least 99% correct 

ACCURACY 

0 Less than 50% complete. 

1 Many mistranslations, omissions, and/or inappropriate additions. 

2 Mistranslation or omission of one or more key terms (including verb tense), and/or 
inappropriate additions. 

3 Mistranslation or omission of one or more minor terms; no inappropriate additions. 

4 No mistranslations or omissions, although some nuance may not be conveyed. 

5 All nuances conveyed. 

*PLEASE REPORT WHAT YOU CONSIDER THE FOLLOWING TO INCLUDE : 
(Use the attached "LS Grammar Grid • Spanish" as a base. I suggest the following disiribuiioi 
ot the levels on the grid. Please let me know if you feel the distribution should be dilferen^ 
and we can talk about it. Feel free to add to the categories below as you see fit.) 

]) BASIC STRUCTURES: (LS Grammar Grid levels 0+ • 2) 



2) HIGH FREQUENCY COMPLEX STRUCTURES: (LS Grammar Grid levels 2+ - ?) 



3) LOW FREQUENCY COMPLEX STRUCTURES: (LS Grammar Gnd levels 3+ - 5] 
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FBI/CAL TRANSLATION SKILL LEVEL DESCRIPTIONS 
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July 2€, 1990 

7BI/CAL TRANSLATION SKILL LSVZL DB6CRJPTI0M8 



glPREBSIOH 

04 Makes very frequent sistakes in •pelling, punctuation, and 
representation of symbols* Uses none or almost none of the 
morphology or syntax conventions of the target language. Vocabulary 
is extremely limited and frecjuently inappropriate, even when using a 
dictionary. Only very simple sentences are correct. Style and tone 
are not identifiable. Renders a translation that appears very 
distorted and for the most part is unintelligible. 

1 Makes frequent spelling and punctuation errors, frequent grammar 
errors in basic structures, and shows little ability to convey verb 
tenses other than the present tense. Syntax is generally equivalent 
to that of source language. Vocabulary is often inappropriate, even 
when using a dictionary, and active vocabulary is usually limited to 
everyday words and cognates. Renders an extremely literal 
translation, i.e. almost word by word. Has no ability to deal with 
complex sentence patterns. Unable to convey style and tone, unless 
their use in source document is very predictable. Portions of the 
translation are unintelligible and others are clearly distorted; 
however, much of it can be understood by native readers used to 
dealing with foreigners' efforts to translate their language. 

Makes many spelling errors and punctuates according to source language 
conventions. Makes many errors in basic granunatical structures, and 
uses very few low frecjuency constructions correctly. Uses syntax 
that is very close to that of source language, while vocabulary is 
limited and makes many errors in choice of words, sometimes even when 
using a dictionary. Attempts at complex sentences often result in 
errors. Uses uneven style and tone that do not reflect those of 
original document. This person's translated documents appear 
distorted but are mostly intelligible to native readers used to 
dealing with foreigners' efforts to translate their language. 

2 Makes spelling errors, while capitalization and punctuation errors 
reflect source language conventions. Uses syntax that is closer to 
source language than to target language. Makes very frecjuent errors 
in low frequency grammatical structures, frecjuent errors in high 
frequency grammatical structures, and some errors in basic structures. 
Vocabulary may be generally too limited to convey abstract thoughts. 
Has only some knowledge of idiomatic expressions and colloquialisms, 
and very limited knowledge of sayings and proverbs. Distorts the 
style and/or the tone of the original document and may inappropriately 
combine use of formal and informal patterns of speech. Produces 
translations that are very literal, but are generally understandable 
to a native reader NOT used to dealing with foreigners' efforts to 
translate their language. 



Makes sone •pelling errors, and »ay uoe capitaliration and punctuation 
that iaitates usage of source lunguage* Uses s/ntax that tends to 
reflect that of source language • May aake frequent errors in lov 
frequency coaplex graBaatical structures, soBe errors in high 
frequency complex structures, and occasional errors in basic 
structures* Has little ability to use complex sentence patterns. 
Vocabulary is adequate to e^^ress some abstract thoughts; can often 
make sensible guesses about unfamiliar words using linguistic context 
and prior knowledge* Has a fair knowledge of Idiomatic expressions 
and colloquialisms and only limited Jcnowledge of sayings and proverbs. 
Tone and style are uneven and somewhat distorted. Produces documents 
that are readily understandable but clearly have been translated. 

Occasionally makes spelling mistakes, some grammar mistakes in lov 
frequency complex structures, sporadic errors in high frequency 
complex structures, and shows no pattern of errors in basic structure. 
Uses punctuation that is almost id ntical to source document, i.e. 
sometimes atypical of the target language* Moderately good ability 
to join or divide original sentences as required by target language 
constructions, while still retaining the meaning of the source 
document. Moderately good ability to use complex structures, sentence 
patterns, and vocabulary appropriate for expressing abstract thoughts. 
Moderately good knowledge of idiomatic expressions and Colloquialisms, 
and some sayings and proverbs, but with occasional misunderstandings. 
Uses a number of syntactic constructions that are more characteristic 
of source language than target language, thereby producing documents 
that appear to be a translattion. This person^s style and tone are 
even, but occasionally differ slightly from original. 

Makes occasional spelling and punctuation errors. Occasionally makes 
grammatical errors in low frec[uency complex structures, sporadic 
errors in high freqv.:*ncy complex structures. Good ability to use very 
complex sentence structures. Uses some syntactic structures that are 
more typical of source than target language which suggest that the 
document is translated. Vocabulary is generally extensive but usage 
is not always precise given the context, especially in the use of 
register and colloquialisms. The style and tone of the original 
document are not always retained. 

This person's errors of grammar are very rare and unpa^terned. This 
person rarely makes a spelling or punctuation erroi . Uses some 
syntactic structures that suggest the document is a translation—while 
these are grammatically correct, they are not typical of the target, 
language. Very good abil ity to use highly complex senter e 
structures. Very good knowledge of idiomatic expressions, register, 
colloquialisms, sayings and proverbs and their equivalents in the 
target language. However, a document rendered by this person may 
occasionally reveal itself to be a translation due to atypical use of 
syntax and vocabulary- The style and tone are equivalent to those of 
the source document. 



44 Makes no grannatical or punctuation errors, and no spelling errors 
that would not ^e Bade by an educated native vriter of the target 
language. There are ainor proble»s of syntax, spelling, or 
vocabulary, which although graKsatically correct are not typical of 
the source language and suggest that the docuaent is a translation. 
These and other infelicities could only be confir»*d by an educated 
native reader of both languages who conpares the docuaents in both 
the source language and the target language. Uses styls and tone that 
are a true reflection of source document. 

5 Produces work that contains no gramnar, spelling or punctuation errors 
that would not be Bade by other well-educated native writers, can 
produce docunents whose syntax is that of the target language, with 
no influence of source language. Can adapt rhetorical structures so 
that the document reads a& if it had originally been written in the 
target language. Can convey all nuances and can use tone and 
stylistic devices that are identical in effect to those of original, 
including use of huaor. 



0<f Has no raal ability to translate connected discourse. Efforts to 
translate contain Many mistranslations and OMissionS; and very little 
infomation froa source docuaent is conveyed. 

I Renders translations whose accuracy is deficient, vith frequent 
mistranslations and omissions and Bay make inappropriate additions. 
Much ft the information from longer source documents is lost. 

1+ Produces translations whose accuracy is inadeq[uate, containing many 
mistranslations or omissions, and possibly additions. Almost all 
nuances are lost. 

3 Produces translations whose accuracy is mostly adec[uate and without 
severe substantive omissions, but without »any nuances, and with quite 
a few mistranslations. May include some additions for clarification 
of areas the translator can not accurately convey. 

2-f Produces translations whose accuracy is adequate, but contain some 
mistranslations or omissions, and reflect a limited ability to convey 
nuances. 

3 Produces translations whose accuracy is good, with occasional minor 
mistranslations or omissions. Can handle clearly identifiable 
nuances . 

3+ Produces translations whose accuracy is very good; there are 
occasional omissions, or sporadic minor mistranslations; nuances and 
subtleties are not always conveyed exactly or not at all. 

4 Renders translations whose accuracy is excellent; almost all nuances 
are conveyed and there are no mistranslations. 

4^ Can produce documents that are totally accurate, convey all nuances, 
and are devoid of mistranslations or omissions. 

5 Can produce translations that are an exact reflection of the source 
document in all aspects, even translating difficult and abstract 
prose. Can produce work that is totally accurate, with no 
mistranslations or omissions. 



Interpretive information 
T*0 KG PROyiCZUiCY 

Ho ability to translate the language. 



KCN0RI3ED PROFICIEMCT 

Able to translate using only seaorised material and expressions, 
such as numbers, dates, addresses, some street signs and shop 
designations. 



T-1 BLZKENTXSy PROFICIEMCY 

(Base Level) 



Able to translate very simple documents in printed or typed form 
at the survival level such as simple messages and simple notes 

conveying basic instructions. 



T-1+ ELEKENTXSY PROFICIENCY 

(Higher Levei) 



Able to translate simple documents in printed or typed form 
dealing with survival needs and routine social demands such as 
simple letters and biographical data. 



T-2 LIMITED WORKING PROFICIENCY 

(Base Level) 

Able to produce understandable translations of aimple documents 
pertaining to routine social and business correspondence and areas 
of professional experience. 



T-2+ LIMITED WORKING PROFICIENCY 

(Higher Level) 

Able to translate vith some precision most factual, nontechnical 
prose as well- as some documents on concrete topics related to 
fields in which he or she has an interest or background. 



T-3 



OEVERAL PROrSSSIOMAL PROrZCZEMCT 

(Baa* Laval) 



Abla to traaalata aoeaptably aoat fonal aad infozmal vrittan 

axchangaa on praotioal« aooial aad profaaaional topiea. 

Daaonatrataa an aaarging ability to tranalata <Waraa aubjact 
Battar* 



GENERAL PROrESSIONXL PROPICIKMCT 
(Highar Laval) 

Abla to tranalata af faetivaly a variaty of docuaanta daaling with 
divaraa aubjact aattar vithin tha aeopa of paraonal or profaaaional 
axparianea. 



T-4 ADVANCED PROrBSSIOHAL PROPICIBVCT 

(Baaa Laval) 

Abla to tranalata vary affactivaly all forma of docniaanta vithin 
tha acopa of paraonal and professional azparianea, can handla othar 
documenta adequataly. 



*-4+ GENERAL PROFESSIONAL PROPICIENCT 

(Highar Laval) 

Approxiaataa a sastar translator* a ability to produca 
translationa that ara an axact raflaction of tha original docuxaant. 



T-5 (Mastar Translator Proficiency) 

Proficiency equivalent to that of a vall-educated Master 

translator. Able to translate even difficult and abstract proaa; 

for exaople, general technical and legal texts aa wall aa highly 
colloquial writing. 
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F.XHIRIT A 



Paragraph Scoring Grid 



(Man) 



(at) 



(1.0) 



U 
(1.1) 

Mmytrrarttfil*- 
iK ifimmv IMC. 



2 

{2i)) 



24 

(W) 



3 

(3.0) 



3^ 

(3J) 



w Unsvca Vra* IMe^ 



^.^'"^^ ^•q-eniiniiMli^ift linwmtebw 
fikairOTiMiraM- m 

Mm imiiiikMflliiifM It! 

Alma«ii|i». C«imtM 

r-rriiiMHiiiiki w ifciiiwy , «wiCTMt liiMMitMniflrf Mmh^Ihi 



4 

(^jO) 



4« 
(4J) 




4mmmmmMf 



pwnfiMiiwrfiiiiiH 



laiM 



■Cl( 
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EXHIBIT B 



QUBSTIOnai&E OV TSAVSLIlTZOV gKILL LSVILS 

Please read the *tt«ehed inforaation on translation skill levels 
We ask that you exasine the criteria, descriptions, and acorinq 
grid in light of your experianca with translation. Your comnents 
on this wterial will help us to develop an accurate test of 
translation ability, if you require aore space than Is provided 
after each question, please continue your responses on the back. 

Section A. Criteria 

1. What relationship do you see between ILR reading/writing level 
and translation skill level? Do you agree with the assessment of 
the relationship described in the criteria? 



2. Do you agree with the description of a -perfect" translation? 
Why or why not? 



3. Are there variables other than those presented that you would 
consider in evaluating translation ability? Do you consider any 
of the variables presented to be uninportant? 
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Segtipn Bt Translation Level Descriptions 

Please read through each •kill level description and note any 
comments regarding a particular description In your responses to 
th« questions below. Be sure to Indicate the •kill level 
descraption and the line within that description that your comment 
applies to. 

1. Do you think any of the characteristics we have Included in 
Level 0-5 is inappropriate to that level? if so, vhich? 



2. Where would you add other characteristics? 



3. Would you delete any characteristics from the descriptions? 



J73 



Are there unclear areas In any of the descriptions? 



Do you agree with the description of a Master Translator? 



(T-5p'^ "^''^'^ ***** this description 



Section ^. Scoring Grid 

IJlnnJ^it''^^^ ^""^^ .^^ designed to aid scorers in .aking a decision 
c«mm^n^ ^PP'^oP^.^te skill level description to assign. Please 
comment on the grid. 

K^^Would you find this 9rid helpful In .valuating a translation 
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2. Where would you make changes to the grid? 



3. What vould you add to the grid? 



4. Do you agree with the percentages listed for spelling and 
punctuation accuracy? if not, what percentages would you 
substitute? ' 



We would welcome any additional comxoents you »ight have. Please 
use the rest of this page or an additional sheet to coMent on any 
aspect of this naterial. Thank you for your valuable assistance 
in developing criteria for rating tests of translation ability. 

Sincerely, 



Charles Stansfield 
Marijke Walker 



4 
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APPENDIX J 



TRIAUNG QUESTIONNAIRE 
ON 

LANGUAGE BACKGROUND AND PROFICIENCY 
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Nane; 
Date: 
Test: 



Itr^ r ^^'Z ■"''^',5°^ agreeing to take part in the trialing of 
the Spanish into English Verbati. TrsnsUtion Exams. Your coaaents 
•bout these exa93 are very important to us. We would like you to 
fill out these forms after you have completed each version of the 
exam. Please be as clear and frank as possible. 

I^t.M^ch^.^V*.* completing each section has not yet been 

established but ve would like you to work as quickly and accurately 

i^'" *'v*" • "'^o^d th* time 

us to establish the completion times for future examinees. 

You are not permitted to use a dictionary on any part of this cxair 
•xcept for the last section which 1. entitled -pfoduction S^cUoS 

permitted to receive or give any assistance 
these exaas. Your cooperation in these matters is 
greatly appreciated. 



How do you rate your overall Spanish ability? 



How do you rate your overall English ability? 



ERIC 
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APPENDIX K 



EXAM FEEDBACK QUESTIONNAIRE 
MULTIPLE CHOICE AND PRODUCTION SECTIONS 
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\ 

Name : 
Date: 
Test: 



Thank you very auch for agreeing to take part in the trial ing of 
the English into Spanish Verba. in Translation Exaas. Your coi. ents 
about these exams are very iaportant to us. We would like you to 
fill out these foras after you have coapleted each version of the 
exam. Please be as clear and frank as possible. 

TNe exact tiae for coapleting each section has not yet been 
established but ve would like you to work as quickly and accurately 

f''" '5 * ^^"^ PI"" record the tiae 

needed to complete each section on these foras. This will enable 
us to establish the coapletion tiaes for future exaainees. 

You are not permitted to use a dictionary on any part of this exam 
frf®^ w section which is entitled -Production Section 

III. You are also not permitted to receive or give any assistance 
regarding these exams. Your cooperation in these aatters is 
qreatly appreciated. 

Hov do you rate your overall Spanish ability? 
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Multiple Choice Section I Cbmpletion tmt:__^ hrs. minutes 


> 

4 


1 


i ) How could the directions be mude clearer? 


f 

t 


> 


2) How should questions be modified, if any, $o that they are less misleadirjg/confusing? 


s 




3) Which questions, if any, do you feel should be deleted? 


/ 

f 

4 

i 




4) Which questions, if any, do you feel should be added? 


i 
\ 

> 




5) What unintended errors, if any, did you find in this section? 






6) Did this section adequately test your knowledge of Spanish? 






7) Were any major points not tested that you feel should have been? 






Did you feci that thjs section was: too long / too short / just right? 






9) An) additional comments? (Continue on the back, if necessary!!) 




r. 0 
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Multiple Choice Section II Completion time: hrs. minutes 

2) How should questions be modified, if any. $o that ihey arc less misleading/confusing? 

3) Which questions, if any. do you feci should be deleted? 

4) Which questions, if any. do you fed should be added? 

5) What unintended errors, if an>. did you find in this section? 

6) Did this section adequate)) test your knouiedgc of Spanish? 

7) Were an> major points not tested that you feci should have been? 

f<) Did you feci that this section was too long / tr / just nghi? 

9; An> additional comments'* (Continue on the back, if neccssaryT?) 



& 
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Production Section I Completion time: hrs. minutes 

1) How could the directions be made clearer? 

2) How should questions be modified, if any. so that the> are less misleading/confusing:* 



3) UTiich questions, if any. do you feel should be deleted: 



4) Which questions, if any. do you feel should be added? 



5) What unintended enors. if an\. did you find in this section? 



6) Did this section adequately test your knowledge of Spanish? 



7) Were an> major points not tested that you feel should have been? 



H) Did you feel that this section was- too long / too short / just right: 
^) An> additional comments? (Continue on the back, if necessary!!) 



iERlC 
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Production Section 11 Completion time: hrs. minutes 

2) H()»* should questions be modified, if any, $o that they are less misleading/confusing 

3) W hich questions, if any. do you feel should be deleted? 

4) Which questions, if any. do you feel should be added? 

5) What unintended errors, if anv. did you find in this section? 

6) Did this section adequately test your knowledge of Spanish? 

7) Were any major points not tested that you feel should have been? 

8) Did you feel that this section was: too long / too short / just right? 
9j Any additional commcnis? (Continue on the back, if necessary!!) 
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Production Section III Cbmpletion time: hrs. minutes 

1 ) Kow could the directions be made clearer? 

2) How should questions be modified, if any, $o that they are less misleading/confusing? 



3) Which questions, if any, do you feel $houId be deleted? 



4) Which questions, if any, do you feel should be added? 



5) What unintended errors, if any, did you find in this section? 



6) Did this section adequately test your knowledge of Spanish? 



7) Were any major points not tested that you feel should have been? 



8j Did you feel that this section was: too long / too short / just right? 
Vj Anv additional comments^' (Continue on the back, if necessary*!) 
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APPENDIX L 



ESVTE EXAM FEEDBACK QUESTIONNAIRE 
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ENGLISH INTO SPANISH VERBATIM EXAM QUESTIONNAIRE 



We would very much appreciate your answers to the following brief questions concerning the 
verbatim translation exams you have Just taken: 

1. Was the length of time given for completing the multiple choice sections about right? 

( ) Too short 
( ) About right 
( ) Too long 

2 Was the length of time given for completing the production sections about right? 

( ) Too short 
( ) About n^ht 
( ) Too long 



Please indicate to what extent you agree or disagree with the following statements: 

3. Vie directions were clear. 

( ) Agree * ( ) Disagree 

4. Vie material in the exams was representative of the types of written documents I might 
encounter in my worl\ 

( ) Strongly agree ( } Agree ( ) Disagree ( ) Strongly disagree 

5. Viere was sufficient opportunity for me to demonstrate my ability to translate from 
English into Spanish, 

( ) Strongly agree ( ) Agree ( ) Disagree ( ) Strongly disagree 
Tliank you for your cooperation. 



APPENDIX M 



PILOT QUESTIONNAIRE AND RESULTS 

ON 

LANGUAGE BACKGROUND AND PROFICIENCY 



Tbank you for agr^^ing to asBitt in avaluatiag tha&o tasts. 

w# raqryast th^t you cpmplata tfia folloving i&foraation to aid in 
our analyses* 

Mame: _^ 



Profdssion: 

Student 

Course of Study: 



Translator 

Teacher 

Other (please specify) 

Native Language: 

r-nglish 

Spanish 

Other (please specify) 

How would you rate your ability to vrita in English? 

Excellent 

Very good 

Good 

Fair 

Poor 



How would you rata your ability to speak in English? 

Excellent 

Very good 

Good 

Fair 

Poor 

How would you rate your ability to write in Spanish? 

Excellent 

Very good 

Good 

^air 

Poor 



How would you rate your ability to speak in Spanish? 

Excellent 

Very good 

Good 

^ Fair 

Poor 



Bachelor's in Spanish 
Master's in Spanish 
Translation Certificate Program 
Other (Please specify) 



QUESTIONNAIRE RESULTS 



UNDERGRADUATES 
Total Respondents: 
Native Languag ^^? 
English: 38 



45 



Bll ingual 
Eng-Span: 



1 



All data Mlf-reported 



Spanish^ 0 
Others 6 



English Writing Ability: 

Excellent: 22 

Very good: 16 

Good: 6 

Faxr: l 

Poo r : 0 



English Soeakino Abil^tyt 

Excellent: 29 

Very good: 15 

Good! 0 

Fain 1 

Poor: 0 



Spanish Writing Ability: 

Excellent: 1 

Very good: 9 

Good: 20 

Fair: 12 

Poor: 3 



Spanish Soea^rinQ Ability: 

Excellent: 2 

Very good: S 

Good: 16 

Fair: IB 

Poor: 3 



GRADUATE STUDENTS 

Total Respondents: 

Native Language: 

English: 3 

Bi 1 ingual 
Eng-Span: 0 



10 



Spanish: 



Other: 



All data self -reported 



English Writing Ability: 



Er eel lent : 
Very good: 
Good ! 
Fai r : 
Poor : 



1 
6 
3 
0 
0 



English Speaking Ability: 

Excellentt 3 

Vf^ry goods 4 

Good: 3 

Fair: 0 

Poor: 0 
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Very good: 3 

6ood: 1 

Fain 2 

Poor: 0 



Scanimh Spgakino Abilitvt 

Excttllenti 5 

Very goodt 2 

Goods 2 

Fain j i 

Poor I • • : 0 
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SELF-ASSESSMENT QUESTIONNAIRE 
AND 

SUMMARY REPORT ON SELF-ASSESSMENT 



J 91 



NAME 



HELD OFFICE 



en p,*ccrccfc«rMT r\r tdamci ati/^m abii rrv 



The purpose of this questionnaire is to leam your candid evaluation of your ability to translate writicn 
documents from ENGLISH INTO SPANISH. It is of the utmost importance that you provide an honest 
evaluation of your present abilities so that the effectiveness of the translation exams may be accurately and fully 
assessed. Please be assured that your responses will be kept confidential by the test development contraaor and 
uill in no way affect your standing or possibility of ad^-anccmenl within the Bureau, 

Inuniction^; : Please estimate your ability to translate the following types of documents using the scale provided 
below: 

Limited The translated document contains many mistranslations and omissions, and frequent errors in 
grammar. The translation is extremely literal (i.e. word for word) and may be difficult to 
understand. 

Functional The translation is fairly accurate with no substantive omissions; houcver, it may contain some 
mistranslations and grammar errors. The translation is literal but generally understandable. 

Competent The accuracy of the translated document is good, with occasional minor mistranslations and 
omissions. There is no pattern of grammar errors. Most idiomatic expressions are used 
appropriately; howcNer, the phrasing may rc\'ca! the document to be a translation. 

Sui .rrior The accuracy of the translation is excellent, uith most nuances convc>*ed. Grammar errors are rare. 
The phrasmg is entircK natural and the document docs not appear to be a translation. 

Please c%-aluate candidlv your abilitv to translate each of the following ty7>cs of documents from English into 
Spanish b> circling the appropriate label. If you have never translated a particular type of document, please 
mark N/A (*not applicable'). 



1. 


FBI forms 


Limited 


Functional 


Competent 


Superior 


N'A 


2 


Depositions 


Limited 


Functional 


Competent 


Supciior 


N/A 


3. 


Police reports 


Limited 


Functional 


Competent 


Superior 


\A 


4 


Correspondence 


Limited 


Functional 


Competent 


Superior 


n;a 


5, 


Legal documents 


Limited 


Functional 


Competent 


Superior 


n;a 


6 


Press releases 


Limited 


Functional 


Competent 


Superior 


N A 


7 


FCI siaius/evaluation reports 


Limited 


Functional 


Competent 


Superior 


N A 




Scicntific/iechnical articles 


Limited 


Functional 


Competent 


Superior 


N A 


9 


Foreign diplomatic reports 


Limited 


Functional 


Competent 


Superior 


N \ 


10 


Training manuals 


Limited 


Functional 


Competent 


Superior 


N A 


n 




Limited 


FunctionjI 


Competent 


Superior 


N A 



(Please specify) 
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NAME 



FIELD OmCE 



SELF.ASSESSMENT OF TRANSLATION ABILITY 

The purpoie of thU qu^Hosniir^ i$ lo kara your csndid evsluatios of your tblllty to trsssteic ^Tittcn 
documeots from SPANISH INTO E'^CLIsn. It is of the utnoit imporuooe Uut you provide en honest 
evaluation of your present abilities so that the efTectiveoe&s of the translation eanu may be accurately and full> 
assessed. Please be assured that your responses will be kept oonfidential by the test development oontraaor anJ 
will in no way effect your sundinj or possibility of advancement within the Bureau. 

Instrtidlops : Please csiimaic your ability to translate the followinj types of documeots ttsin| the scale prov^dcJ 
below: 

Limited The translated document conuins many mistranslations and omissions, and frequent errors in 
grammar. The translation is extremely literal (Le, word for word) and may be difncutt to 
understand. 

Functional The translation is fairly accurate with no substantive omissions; however, it may oonuin some 
mistranslations and grammar errors. The translation is literal but generally undcrsundablc. 

Competent The accuracy of the translated document is good, with occasional minor mistranslations and 
omissions. There is no pattern of grammar errors. Most xJiomatic eacpressions arc usc<3 
appropriately, however, the phrasing may reveal the document to be a translation. 

Superior The acDjracy of the translation is excellent, with most nuances conveyed. Grammar errors are rare 
The phrasing is entirely natural and the document does not appear to be a translation. 

Please CN-aluaie candidly your ability to translate each of the following types of documents from Spanish into 
En£!ish by circlmg the appropriate label If you have never translated a panirjlar type of document, please 
mark N/A Cnot applicable*). 



1. 


Ncu-spaper articles 


Limited 


Funaiona! 


Competent 


Superior 


N7A 


1 


Nc\N"spaper cdiionaU 


LJmiied 


Functional 


Competent 


Superior 


N/A 


3. 


Depositions 


Limited 


Functional 


Competent 


Superior 


N/A 


4 


Police reports 


Limited 


Functional 


Competent 


Superior 


N/A 


5 


.Correspondence 


Limited 


Functional 


Competent 


Superior 


N/A 


6 


Legal documents 


Limned 


Functional 


Competent 


Superior 


N/A 


7 


Lciiers rogaior> 


Limited 


Functional 


Competent 


Superior 


N.A 


8 


Case histories 


Limited 


Functional 


Competent 


Superior 


N A 


9 


FCI siaius'cvaluaiion reports 


Limned 


Functional 


Competent 


Superior 


N.A 


10 


Scieniific/icchnical articles 


Limited 


Functional 


Competent 


Supenor 


N.A 


11 


Foreign diplomatic rcporu 


Limited 


Functional 


Competent 


Superior 


N/A 


12 


Training manuals 


Limited 


Functional 


Competent 


Superior 


N/A 


13 




Limited 


Functional 


Competent 


Superior 


N/A 



(Please specif)) 
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SUMMARY REPORT ON SELF-ASSESSMENT: ENGLISH TO SPANISH 



The following section consists of an analysis of the results 
ox tho English-to Spanish Self -Assessment Questionnaire which was 
completed by FBI personnel participating in the validation study. 

This section specifies: 

1. the document type which the pe^rticipants checked most 
frequently; 

2. the average rating for each document type; 

3. the per cent of total respondents who gave a response 
for each document type; 

4. the document types which correlated most significantly 
with the FBI translation skill level descriptions. 



AVERAGE RATING OF EACH DOCUMENT TYPE 

Ten document types, listed below, were translated. The 
questionnaire required the employee to rate his or her 
ability to translate each document type on a four point scale. 
The options on the scale were: 4, superior; 3, competent; 2, 
functional; and 1, limited. There were 35 respondents to the 
English-to-Spanish questionnaire. The table below gives the 
percent who responded to each document type, and the average 
self-rciting, ranked in descending order. 



DOCUMENT TYPE % 


RESPONDING 


AVERAGE 






SELF-RATING 


1 . ESCORRES (correspondence) 


97 


3. 11 


2 .ESPOLRPT (police reports) 


69 


3.04 


3.ESFBI(FBI forms) 


71 


2.96 


4.ESPRESS (press releases) 


69 


2.91 


5. ESDEPOS (depositions) 


60 


2.85 


6. ESTRNG( training manuals 


57 


2. 85 


7. ESDIPL(for. diplomatic reports) 


46 


2.75 


8.ESFCI(FCI reports) 


51 


2.72 


9.ESLEGAL(legal documents) 


69 


2.58 


10. ESTECH (technical documents) 


54 


2.57 



The self-rating most frequently chosen was COMPETENT. The lowest 
average self -ratings, for legal documents, technical documents 
and FCI reports, indicate that raters responded to these types as 
most difficult to translate .Evidently they identified police 
reports and correspondence as easiest to translate. 
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t 

i 

: CORRELATIONS WITH OVERALL SCORES 

I The table below presents the correlations of each document 

type with the overall scores for Expression and Accuracy. The 
number of paired scores is listed in parentheses below each 
correlation: 



DOCTYPE 


EXPFl 


EXPF2 


ACCFl 


ACCF2 


ESFBIFRM 


0.31 


0.13 


0.56* 


0. 64* 




(25) 


(24) 


(25) 


(24) 


ESDEPOS 


0. 38 


0.21 


0.54* 


0. 52* 




(21) 


(20) 


(21) 


(20) 


ESPOLRPT 


0.49* 


0.36 


0.45* 


0. 59* 




(24) 


(23) 


(24) 


(23) 


ESCORRES 


0.30 


0.22 


0.34* 


0.53* 




(33) 


(33) 


(34) 


(33) 


ESLEGAL 


0.26 


0.22 


0.41* 


0. 43* 




(24) 


(23) 


(24) 


(23) 


ESPRESS 


0.42* 


0.25 


0.45* 


0.51* 




(24) 


(23) 


(24) 


(23) 


ESFCI 


0.43 


0.21 


0.57* 


0.51* 




(18) 


(18) 


(19) 


(18) 


ESTECH 


0.28 


0.13 


0.28 


0. 32 




(19) 


(18) 


(19) 


(18) 


ESDIPL 


0.39 


0. 19 


0.56* 


0.47 




(16) 


(16) 


(16) 


(16) 


ESTRNG 


0. 55* 


0.34 


0.42 


0. 53* 




(20) 


(19) 


(20) 


(19) 



*p< . 05 

Ranked in descending order, the documents showing the 
highest correlations with the expression totals on Form 1 were 
training manuals, police reports, and press releases. No 
significant expression correlations appeared for Form 2, although 
the order of magnitude of the correlations for Form 2 is similar 
to the order for Form 1. 

The documents showing the highest correlations with the 
accuracy totals for Form 1 were FCI reports, FBI forms and 
foreign diplomatic reports, and depositions. On Form 2, these 
documents were FBI forms, police reports, correspondence, 
depositions, press releases, and FCI reports. The correlations 
for accuracy were higher, on the whole, than the expression 
correlations for the English-to-Spanish sel f -assessments. 
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APPENDIX 0 



CONVERSION TABLES: RAW SCORE TO TSL SCORE 
EXPRESSION AND ACCURACY 
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Form 1 - ESVTE 
Conversion Table - ESVTE | 
E?tprgggign Raw Scgr< tsl score i 



1 


* 


2 


* 


3 


* 


4 


* 


5 


* 


6 


* 


7 


* 


8 


* 


9 


* 


10 


* 


11 


* 


12 


* 


13 


* 


14 


* 


15 


* 


16 


0.3 


17 


0.4 


18 


0.4 


19 


0.5 


20 


0.5 


21 


0.5 


22 


0.6 


23 


0.6 


24 


0.7 


25 


0.7 


26 


0.8 


27 


0.8 


28 


0.8 


29 


0.9 


30 


0.9 


31 


1.0 


32 


1.0 


33 


1.1 


34 


1.1 


35 


1.1 


36 


1.2 


37 


1.2 


38 


1.3 


39 


1.3 


40 


1.4 


41 


1.4 


42 


1.5 


43 


1.5 


44 


1.5 


45 


1.6 


46 


1.6 



* 1-15 = chance scores 




Form 1 - ESVTE 



Expression Raw Score TSL Score 

47 1.7 

48 1.7 

49 1.8 

50 1.8 

51 1.8 

52 l.S 

53 1.9 

54 2.0 

55 2.0 

56 2.1 

57 2.1 

58 2.2 

59 2.2 

60 2.2 

61 2.3 

62 2.3 

63 2.4 

64 2.4 

65 2.5 

66 2.5 

67 2.5 

68 2.6 

69 2.5 

70 2.7 

71 2.7 

72 2.8 

73 2.8 

74 2.9 

75 2.9 

76 2.9 

77 3.0 

78 3.0 

79 3.1 

80 3.1 

81 3.2 

82 3.2 

83 3.2 

84 3.3 

85 3.3 

86 3.4 

87 3.4 

88 3.5 

89 3.5 

90 3.6 

91 3.6 

92 3.6 

93 3.7 

94 3.7 

95 3.8 

96 3.8 

97 3.9 



J 98 



Form 1 - ESVTE 



Expression Raw Score TSL Score 

98 3.9 

99 3.9 

100 4.0 

101 4.0 

102 4.1 

103 4.1 

104 4.2 

105 4.2 



Conversion Tables 



Accuracy Raw Score TSL Score 

1 0.5 

2 0.6 

3 0.6 

4 0.7 

5 0.7 

6 0.8 

7 0.8 

8 0.9 

9 0.9 

10 1.0 

11 1.0 

12 1.0 

13 1.1 

14 1.1 

15 1.2 

16 1.2 

17 1.3 

18 1.3 

19 1.4 

20 1.4 

21 1.5 

22 1.5 

23 1.6 

24 1.6 

25 1.7 

26 1.7 

27 1.8 

28 1.8 

29 1.9 

30 1.9 

31 2.0 

32 2.0 

33 2.0 

34 2.1 

35 2.1 

36 2.2 

37 2.2 

38 2.3 

39 2.3 

40 2.4 

41 2.4 

42 2.5 

43 2.5 

44 2.6 

45 2.6 
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Form 1 - ESVTE 

Accuracy Raw Score TSL Score 



46 


2.7 


47 


2.7 


48 


2.8 


49 


2.8 


50 


2.9 


51 


2.9 


52 


2.9 


53 


3.0 


54 


3.0 


55 


3.1 


56 


3.1 


57 


3.2 


58 


3.2 


59 


3.3 


60 


3.3 


61 


3.4 


62 


3.4 


63 


3.5 


64 


3.5 


65 


3.6 


66 


3.6 


67 


3.7 


68 


3.7 


69 


3.8 


70 


3.8 


71 


3.9 


72 


3.9 


73 


3.9 


74 


4.0 


75 


4.0 


76 


4.1 


77 


4.1 


78 


4.2 


79 


4.2 


80 


4.3 



ERIC 
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Form 2 - ESVTE 

Conversion Table 
Expression Raw Score TSL Score 



1 


* 


2 


* 


3 


* 


4 


* 


5 


* 


6 


* 


7 


* 


8 


* 


9 


* 


10 


* 


11 


* 


12 


* 


13 


* 


14 


* 


15 


* 


16 


0.5 


17 


0.5 


18 


0.6 


19 


0.6 


20 


0.7 


21 


0.7 


22 


0.8 


23 


0.8 


24 


0.8 


25 


0.9 


26 


0.9 


27 


1.0 


28 


1.0 


29 


1.0 


30 


1.1 


31 


1.1 


32 


1.2 


33 


1.2 


34 


1.3 


35 


1.3 


36 


1.3 


37 


1.4 


38 


1.4 


39 


1.5 


40 


1.5 


41 


1.6 


42 


1.6 


43 


1.6 


44 


1.7 


45 


1.7 


46 


1.8 



15 = chance scores 
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Form 2 - ESVTE 



Expression Raw Score TgL gggfg 

47 1.8 

48 1.8 

49 1.9 

50 1.9 

51 2.0 

52 2.0 

53 2.1 

54 2.1 

55 2.1 

56 2.2 

57 2.2 

58 2.3 

59 2.3 

60 2.3 

61 2.4 

62 2.4 

63 2.5 

64 2.5 

65 2.6 

66 2.6 

67 2.6 

68 2.7 

69 2.7 

70 2.8 

71 2.8 

72 2.8 

73 2.9 

74 2.9 

75 3.0 

76 3.0 

77 3.1 

78 3.1 

79 3.1 

80 3.2 

81 3.2 

82 3.3 

83 3.3 

84 3.4 

85 3.4 

86 3.4 

87 3.5 

88 3.5 

89 3.6 

90 3.6 

91 3.6 

92 3.7 

93 3.7 

94 3.8 

95 3.3 

96 3.9 

97 3.9 
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Form 2 - ESVTE 



Expression Raw Score TSL Score 

98 3.9 

99 4.0 

100 4.0 

101 4.1 

102 4.1 

103 4.1 

104 4.2 

105 4.2 
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Form 2 - ESVTE 

Conversion Tables - Foim 2 
Accuracy Raw Score TSL Score 



1 


0.5 


2 


0.6 


3 


0.6 


4 


0.7 


5 


0.7 


6 


0.8 


7 


0.8 


8 


0.9 


9 


0.9 


10 


1.0 


11 


1.0 


12 


1.1 


13 


1.1 


14 


1.1 


15 


1.2 


16 


1.2 


17 


1.3 


18 


1.2 


19 


1.4 


20 


1.4 


21 


1.5 


22 


1.5 


23 


1.6 


24 


1.6 


25 


1.7 


26 


1.7 


27 


1.8 


28 


1.8 


29 


1.9 


30 


1.9 


31 


2.0 


32 


2.0 


33 


2.1 


34 


2.7 


35 


2.2 


36 


2.2 


37 


2.3 


38 


2.3 


39 


2.4 


40 


2.4 


41 


2.5 


42 


2.5 


43 


2.6 


44 


2.6 


45 


2 7 
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Form 2 - ESVTE 

Accuracy Raw Score TSL Score 





2 7 


47 




48 


2.8 


49 


2.9 


50 
«/ w 


2 9 


51 






3 ft 




3 1 


54 


3 1 


55 


3 2 


5fi 


3 2 


57 


3 3 


5fi 


3.3 


59 


3.4 


60 


3.4 




3 5 


62 


3 5 


63 


3 6 


64 


3 . 6 


65 


3 7 


66 


3 7 


67 


3 . 8 


6fi 

V/ o 


3 ft 


69 


3.9 


70 


3.9 


71 


4.0 


72 


4.0 


73 


4.1 


74 


4.1 


75 


4.2 


76 


4.2 


77 


4.3 


78 


4.3 


79 


4.4 


80 


4.4 
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APPENDIX P 



MEMORANDUM ON TOTAL SCORE CONVERSION 

TO 

ILR EQUIVALENCY RATING 
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Meiso 

From:* Charles Stansfield 
Date: May 15, 1990 

Subject: Total score conversion to ILR equivalency rating 



As I indicated to you on the phons, ve have encountered a 
problem in converting the total score on the test to en ILR*-like 
Translation Rating. Each examinee took two fonfis of the test and 
each examinee was given an overall ILK^like rating by each of two 
raters based on the examinee's perfon&ance on each test. The 
raters assigned ratings for Accuracy and Expression. Thus, each 
examinee received four estimates of his ILR level (estimates per 
form) for accuracy and four estimates of his ILR level for 
expression. 

We averaged the four estimates of ILR rating to come up with 
an overall Translation rating. ^ We then correlated the test scores 
with the Translation rating. The high correlation (an average of 
.90) allowed us to use the resulting regression equation to predict 
Translation rating from the total score on the test. Thus, ve were 
able to construct a score conversion table for all points on the 
test scale which would produce an estimated Translation skill 
level . 

One of the problecs with such conversion tables is a 
phenomenon known as the "regression effect" (different meaning fror 
the use of regression above). The regression effect means that 
examinee's whose first score is far from the mean will be predicted 
to be closer to the mean on the second score. Thus, most examinees 
whose score on our test is at the top of the distribution will be 
predicted to have a lower ILR score than they received from the 
raters. Similarly, most examinees whose score on our test was at 
the bottoE of the distribution were predicted to have a higher ILR 
score than they received from the raters. 

Attached is a copy of the scatterplot for 42 FBI examinees. 
The ILR expression rating is on the vertical axis, while the total 
expression score on our test (ESVTE) is on the horizontal axis. 
We have drawn in the regression line with a pencil. This is the 
straight line that best fits the distribution. For any other line, 
if you calculated the deviations produced by comparing obtained 
scores with the predicted scores, the sun of the deviations fror 
the regression line would be greater. 

On this scatterplot each A represents one examinee. Each B 
represents two exaninees. As indicated in the note at the better., 
14 exai^inees* scores are not or the scatterplot because ^Jj^ir 
scores and the regression line coincided. Thus, for ^^^^^ 
examinees, the conversion table worked perfectly. The asterisks 
are the cocputer*s representation of the regression line. In this 
scatterplot you will see some tendency for the deviations between 
the actual and predicted score to be quite small near the center 
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of the distribution^ And larger at the •nds* You will also see 
soae t^ndancy for sxasinsss yho scored above BO on the ESVTE to 
have a predicted score that is lover than their obtained score. 
Similarly, for examinees who scored below 40, the predicted score 
is usually higher than the obtained score • Thus, »ore of the 
obtained scores for these people are below the regression line than 
above it. 

One effect of the regression effect is to lower the range of 
ability neasured by the test. That is, -the highest ability 
examinee on this test obtained a rating of 4.5 but the conversion 
table predicts his predicted skill level to be 3.8. This person 
was probably one of the three professional translators who took the 
test. 

One option we have, which would reduce the regression effect 
described in paragraph three above is to tilt the regression line 
to the left by transforming the scores so that the maximum ILR 
score level is higher, 4.5 for example. However, we have no basis 
other than intuition for doing this. That is, the sample did not 
contain peopl^a whom we knew beforehand were at the 4.5 level or 
higher. While this seems reasonable, in that it reduces the 
regression effect, it also increases slightly the amount of error 
in the predicted ILR scores all along the continuum. Thus, it 
seems unwise. 

Another option is to have several people take the test whor 
ve know to be level 4+ and 5 translators, and enter their results 
into the equation. This would have to be done later, however. So, 
that's our dilerir:a. As it stands, no one in the sample would earn 
a predicted ILR rating above 3^-, and because of the lack of high 
ability examinees in the sample, it is not possible to earn a 
rating higher than 4.2 or the test, even though we believe it to 
be sensitive to differences in ability in the 4-5 range. Further 
evidence that the test could discriminate in that range ic found 
in the fact that the highest raw Expression score on the test was 
S8 on the ESVTE and S6 on the SEVTE, while the maximum possible 
total score was 105. Similarly, for Accuracy, the highest raw 
score was 71 on the SEVTE and 75 for the ESVTE, while the maximur. 
possible total score was 80. Thus, the difficulty level of the 
test exceeds the ability level of any examinee in the sample. 

As a future project, we should think about how we can identify 
3t least IC high level translators and then administer the tests 
CO then. We would then be able to revise the score conversion 
table so that the ILR ratings for high ability candidates are r.ore 
accurate than at present, and so that the test will measure ability 
up to a higher level than at present. 

For the moment, it may be best to leave the conversion table 
as is. However, if this conversion table is used, test score users 
should be aware that it may underpredict the true 1®^^^%.°' 
examinees whose predicted ILR rating is 3.5 or above inis 
information should be incorporated in any test manual tnaT: yuu 
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prepare. 



In general, I find this disappointing. We tried to sake the 
test hard enough to seasure ability as high as level 5. However, 
because 5's did not show up in the sanple, the test appears to fail 
to aeasurc at such a high level. 

On a ffiore positive note, I should say that the test seens to 
predict the average Translation skill level rating assigned by our 
raters very accurately between the 1.8 and 3.5 range, which is the 
range in which ©ost FBI personnel scored. 

I should mention one «ore concern. All of the 17 FBI 
employees on whon we had Translation level ratings on the FBI's 
current translation test received a lower Translation rating on our 
test than on the FBI test. The average difference was about half 
a full level, with differences typically being larger for examinees 
whose FBI test score was 3.8 or above, and being smaller for 
examinees whose FBI test score was 2.8 or below. Thus, either a.) 
the FBI's current test is too generous, or b.) our raters are too 
severe, or c.) the tine constraints on our test do not penuit the 
examinees to revise their translations and demonstrate their txnje 
ability, or d.) the exaninees were not motivated to give their best 
perfomance when they took our test, or e.) the examinees' true 
Translation ability declined subsequent to taking the FBI test. 
Do you have any thoughts about a.) or e.) above? 
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ENSP Fora 2: EXPILR12 Pr«dictod froB •xptotf2 E 

13:57 Tutsday, May 15, I9t 

Plot of EXPILR12*EXPTOTF2. Legend: A - 1 obs, B - 2 ob«, etc. 
Plot of PRED*EXPT0Tr2 . Syttbol u»*d is 
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Dear Language Specialist. 

Jrf , ^^"^'^^^ ^"'^ contracted with the Center for Applied Linqu.st.cs 

In^nlh w'/ test. Spanish into English and English nto 

hnqu f.c^sk7ca"r^^^^^^^ ?h * '^^^^ ^'o^^'V ^or the actual 

linguistic tasKs earned out by Language Spec a ists. Therefore we would tpaIIu 

f?.S?n''iw^°"' '"P"'- ^^^<i^y^^^ yo^ to fill out the aSd questionnaire^ feel 

the2Scf,«". ° 'PP'»P"»" "lu-^n. As concerns 

ass.9nment thjt was performed once or that is perforrned onV'arely 

t^e's^e's rnt^rprhn!t?e"n"a';?x';rth7p'i rVTe^' ^""'"'^ 

T^tfnlt^^ for your help 



1 



larijke Walker 
Testing Prog.'am Manager 
Language Services Unit 
FBIHQ, Room 3505 

Phone HOx4160 
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FROM INGUSH TO SF^ANISH 



I. ORAL TASKS 

%0F YOUR TIME 

Interpretation Assignments 
Check as many as are applicable 

unannounceo visitors 

tours 

conferences 

othe' (please spec«'>) 



Oral Proficiency Test (Spanish) 

TASKS INVOLVING WRIHEN MATERIAL 



OF YOUR TIME % OF YOUR TIME 

TRANSLATING SUMMARIZING 



Legal Documents 

Check as many as a^e applicable 

letters roga:o'> 

eitrad tion requests 

laws. v'0'a**onviega rtg^:% 

vwanted pcste'S 

oxheMP'easespec fy) 



Booklets Manuals 



Check as many as are applicable 
sctence tecnnoiog, 

tO^'S 

tra *^ ng 

otne' (p easespecS) 



Forms 



Check as many as are applicable 

DO; fo^nr^ 

othe' (please specify) 



0 ther|pl ease specify) 



% OF IfOUR TIKE % or »OUR TIKE 

SPDIT ZM IRAHSIATDIC SPEKT IH SUMKWUZDIG 



Recorded Conversations: 
TELEPHONE 

CNICC AS HANT AS AKI A^mUIlt: 
poMtiCt 

ftntrtl t^tft/i#iitf coUor cfla* 

. •ci'f'Kf/tfchrwlofy 



t^fM 



BODY RECORDER 

C*£:c AS •Ufc' AS Alt A*f*.i:Al.£: 

ft'^'i. t^fft/li^^»te colli- crip^e 

fU':ctic» trtfficking 

^o'f 19^ co^^tf ' intf i t ige^r 
r iMi'y 

t^eM 

ing 

co^rstf '•If Min^ 

P«y^Mi/porchiiei 

tt^f (plfftse tptcify) 



Pthej (please specify): 



4 OF » WK Ti«E % or ¥OUR TIKE ? 

SPEHT Si TOANSIATDIC SPDTT IN r JKWUaZINC I 

I 

Medical Reports . 4 



C«£:k as lUifT AS Att t^PitUMA: 



/Patents 



other (please specify): 



TV* TASKS INVOLVING LISTENING 



Broadcasts : 



% OF YOUR TIKE % OF YOUR TIKE 

SPENT IN TRANSLATING SPENT IN SUMMARIZING 



tnlZt AS PUkT AS Al( APP^tCAt.C: 



pc 1 1 1 ICS 

bui^otif /f if\j*%Cf 
or^0^► c* 

irncfil t^fft/a^ttt colli' criM 
O'sa^ii^d trxmt 
HA^cct ici tr»f f ick ins 

»c if^f /tecKnolosr 

• ' I Iti'y 

legal 
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% OF YOUR TZXE % OF YOUR TDfE 

SPD(T Dl TRANSXATZNC SPDIT Dl SlMARIZDiG 



Xtojnestic/jDt«rastio/3aJ T0rrorism 

C«t:c M ftA«T M Ate A^^llCAi;f: 

- itttut and •vtlu«ti«n reportt 
■ cotf ^($tori#t 

poJut ftcerdf 
_ cou*t ft<o^di 

^^^^^^ t^«vfl tfOCkMnti 

other (pltftftt Specify) 



roreig.T Counteri/Jtei Jigence 

C«"f:« AS AS At£ A>>ilUl.t: 

ftc'.icei of oii^jonifTi: d*p:ff-^ti 
©the' (p.e«ie tprcify) 



Treaty Peguests/Letters togatory 
Sci er. t j / i c/Techn ical 

C*i:c AS lUW* AS Ate AM^ICAg.l: 
b'Oloir 

t«ptoiive 0^ inc<^i«py 0^vicet 

•»eopOA$ 

o^ito^otitei other vthiclei 

otNer (pte«&e »pec»fy> 
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Letters to tJie Dir#rt^ftr 
and other FBI ^ffirj^jo 



Telet> 

(TtAJiSiATtOil OilT) 



% or YOUR TIKE 
6PEKT ZN TRANSIATZMC 



♦ OP YOUR TIME 
BPDTT IN SUMKWUZING 



I^qal/Yechnlfff] , 

General Theft/White Collar Crime 

C«£:c AS MkT AS At* Ar^tlCAg.C; 

bA"A rtcof^dft 

coj't rtco-di 

«5'»t'' Cpttaif specify) 



Orcdr.i2e:3 Crxr,e 

t^iZz AS Ml.' AS Atf APP.ICAS.C: 

rtcordi 

^ pel ict repc'ti 

cow't rtco'^Oi 



Karcotics rra/Zic^ciny 

C*£:f AS WiiT AS A»{ APP4,1C*I.C: 



rtpcti 

cou't reeo^'di 
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QUESnONNXIKE TO DFTERKINE TOE FBI*S HIANSIATION NEEDS 



<0M SPANISH INTO ENGLISH 

% OF YOUR TIKE 

ORAL TASKS 

Interpret ation Assicrrwents ; 



ZK AS lUMY AS Ate APPitCAIlE: 
CO***? 'fnces 

Oral Proficiency^ Exar.inations ; 

(C«iC«lSi> Ok.*) 



* OF YOUR TIME 

GRADING OF FOREIGN lANGUAGE EXAMINATIONS 



TASKS INVOLVING WRITTEN MATERIAL 



% OF YOUR TIME t OF YOUR TIME 

SPENT IN TRANSLATING SPENT IN SUKKARIZING 



vspapers/Magaz ines : 

:r «S HAfcT AS Aif APP.ICA6.E: 

_ pc : 1 1 « c s 

bus » Of ss/' I nance 

_ 9e'^'*l tbeM/ir>iu cell*- cr»»^ 

_ or 9*". lire Cr iir^ 
_ na^'cctici trafficking 
_ do^it ic/tntfn^t 10^' te'fOfii** 
^ ^off igr. courtf rinteU igencf 
»cier<f/ttchfwlosy 

_ legal 

_ othfr (please iprcHy) 
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«UBSTIONNAniB RESULTS 



TOTAL NUMBER OF RESPQWDFMTS. 



2t 



AVERAGE TIME SPENT 



(Averafes were calculated based on nuaber of respondents to each 
question; 0% answers were not factored In anless all answers were 

0) 



ORAL TASKS 

Interpretation Aasl rnaenta 
Nuaber of respondents: 
Averafe % of tlae spent 



lS/28 
4.9% 



The Bost frequent catejory checked by respondents vas 
'unannounced visitors' i/nder "other," respondents listed tisks 
such as Intervlewlnf, suspects, bandllnt coaplalnts, and 
debrleflnf Inforaants, wltn'.'&ses and subjects. 



Oral Proficiency E xawinatlons 
Nueber of respondents: 
Average % of tlae spent 



1/28 
1.0% 



GRADING OF FOREIGN LANGUAGE EXAMINATIONS 
Nuaber of respondents: 
Average % of tlae spent 



1/28 
70.0% 



TASKS INVOLVING WRITTEN MATERIAL 
Newspapers/May aglnes 

* ot time 

spent tranaUttng 
23.3% 

Nuaber of 

respondents _ 
12/28 



% Of Upy 
spent suaaarlzlny 

21.0% 

Nuaber of 

respondents 
6/28 



Tbe categories aost chosen by respondents were politics, 
narcotics, terrorlsa, foreign counterintelligence, le^al, theft, 
and organized crlae. Tbe other categories were seldoa chosen. 



1 
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Letters to the Dirgctor 
and othi^r FBI official* 



ipcnt trinslitinc «pent «D«.HHnr 

NWHber WuFiber of 

rcsppntfcpta respondents 
4/28 1/28 

Teloty poa 

< Of Xlmf ft of tf e 

apcnt tr»ns l ttlnt spent su».«ri»inf 

Nq»t>Cr gf Wu.ber of 

respondents ro spondonts 

1/28 0/28 



Letal/Technlctl 



General Theft/White Collar Crlae 

spent tr anslatlny 
«.T5t 

NuBber of 

respondents 
12/28 



spent suaaarlzlny 
tlx 

WuBber of 

respondents 
2/28 



All categories were chosen by respondents. Under "other," 
translation of letters was Indicated, as well as translation of 
affidavits and signed stateaents. These "other" Iteas were 
repeated throughout this section. 



2 
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Organized Crlae 

% 9t t i ie t 9t tiig 

8.1» 

Vv^tT 9t Wuabiir ef 

^^»P<^"^g"<* reipondents 
•/2i 1/28 

The catefory mott frequently chosen «as "police reports." 

Narcotics Trafficking 

t »r t i le t of tiii» 

iP»nt tr»f|»llHnf spent iti«.r<»tnT 

37.5% 

WUHI^er gf Wuaber of 

respOIKtenta respondents 
16/28 4/28 

The category aost frequently chosen was "court records." Under 
•other," translation of letters and ledger (log) notes was 
indicated, as were T-III and T-IV translations. 



Doaestlc/Internatlonal Terrorlsa 

53.2» 26.6% 

Nuiber of Wuaber of 

fCSpgn<>ent S respondents 

10/28 2/28 

The Bost frequent responses were "case histories" and "court 

records." Asong "other" responses was translation of 
coaauniqu4s. 
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Forelfn Counterlntelllfence 

t of time 
gpent tr anslatlnt 

WuBber of 
respondents 
18/28 



24.n 



Muaber of 

7/28 



The category aost itreqaently chosen vat 'status and evaluation 
reports." Under "other," catefories listed include political and 
■illtary intelligence and defectors' reports. 



Treaty Kequests/Letters Rogatory 

gpent transUtl.'it 
.75% 

Nuaber of 

respondents 
2/28 



« of tl«e 
ipcnt tpanarlzint 
0 

Wuaber of 
respondents 
0/28 



Scientific/Technical 

% Of tipe 
spent tr anaUtlny 
12k 

Wuaber of 

respondents 
6/28 



* of tfe 
spent suaaarlzlny 

0 

WuBber of 

respondents 
0 



The categories aost frequently chosen were ei^loslve and 
Incendiary devices, weapons, and autoablles and other vehicles. 
Flnferprlntlr.f/DNA typing and computer technolojy were seldoB 

chosen. 
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Medical Reports 



t ot tl«e 

ipent translitim 
3.n 

WuBbcr of 
respondents 
S/28 



% 9t %imt 
■Jptnt iBMirUlfti 
0 



WoBber of 

rcapondcntg 

0 



"Other" responses include aedJcal reports to be ased ss evidence, 
progress reports, snd hospital reports. 



Patents 

' *^^liVl^* resDondent« 
0/28 Q 



CLUfix (Respondent listed police reports and ownership/sale 
docuBents). 



spent translatin g 
2% 



% Of timt 
spent soMBarlzlny 
0 



respondents 
1/28 



WuBber of 

respondent s 
0 
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TASKS INVOLVING LISTENING 



ir»itfcttti 

% 9t ViMC * tl»e 

tpent trtnalatint «pent «o«»>rizinf 

44.2k 73k 

Muabcr cf Nu»ber cf 

respondents respondents 
10/28 6/28 

The aost frequently-cbosen catefory is "narcotics trafflcklnf," 
Boslness/flnance, econoBlcs, science/technology, Military, and 
legal were chosen seldoB, if at all. "Other" tasks Inclodc radio 
transaisslons and ship-to-shore, ship-to-ship broadcasts. 



■onttttrlnf ef Li¥e Con?4>i»istlons 
Telephone: 

1 Of tiwe t of ttae 

Spent transUtlRi spent suwrizinf 

9S.Sk 2S.6k 

Wuiber gt Nuaber of 

resP9n<>entS respondents 
21/28 lS/28 

Categories aost often cbosen Include tbeft/vhlte collar crlae, 
organized crlae, narcotics trafficking, terrorlSB, and 
counterintelligence. The other categories were seldoe chosen. 



Body Microphone: 



k of tlae 
spent trsnslatlnf 
21. 8k 



% 9 t tilt 
spent sua aarlzlnt 

30.6k 



16/28 



Nueber of 

respondents 
8/28 



The Itea cbosen Bost often Is narcotics trafficking. The other 
fteas on the checklist were seldoB chosen. "Other" responses 
included aicropbone surveillance of live aonltorlng, Title III 
Live aonltorlng, TIV, and rooa ("hidden") Bikes. 
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Becordad Coniraraatiami 



T«l«pkon«: 

6o.n 

9t Wu.ber cf 

14/28 

The iteas aost frequently chosen are the taae »t those for live 
conversations. The Individual participants see* to have a wider 
range of experience with recorded rather tban llvo aaterlal. 



Body ll«cerd«r: 

» of tfe 
spent traniUt^nf 
26.0« 



Nuaber of 

respondents 
26/28 



% 9t UWH 
spent su Marlztny 
32.0% 



NuBber of 

9/28 



ClAtr: (Answers Included pretext calls and consensual 
recordin(S) 

* 0^ * of t l»«> 

27.8» 

ffVWbtr 9t Nunber of 

respondents respondents 

6/28 4/28 
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SECOND QUESTIONNAIRE: QUESTIONNAIRE TO DETERMINE FBI'S 
TRANSLATION NEEDS 

ORAL TASKS 

Interpret ation A i slfnaenta 

Nuaber of rtspoudents: 1%/IB 
% of tiae spent 5% 

The category aost often chosen Is "anannoanced visitors." A 
frequent category listed onder "other" Is listening to three-way 
phone calls. Other categories Include field interviews of 
witnesses and polygraph ezaalnatlons. 

Oral Proficiency Test 

Nuaber of respondents: i/28 
% of tiae spent 4% 



WRITTEN TASKS 
Legal Docoaents 

% Uwe % of tiae 

iipgnt trftnsUtln? spent luaaarlzing 

15» 10.5% 

Nuaber of 

respondents 

11/28 2/28 

All categories were checked, but 'extradition requests" «as 
chosen very Infrequently. "Other" categories listed Include: 
pdice reports, depositions, foreign consulate reports, and 
stateaents. 



Bootlets/Mtnnal^ 

* 9t tlmf t of tiae 

apent translaUnr spent suaaarizlng 

Nupbgr Qt. Nuaber of 

'espondents respondents 

6/28 1/28 

"Training aanuals" and "science/technology" were the Iteas aost 
often chosen. 
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t of tt«c 
gpent tra nsUtinf 

itk 

WuBber cf 

respondents 
3/28 

'Bureau foras" was checked aost often. 

t Of tl.e 

spent translatlnf 
3k 

NuBber of 

respondents 

2/28 



% 9t ttie 
ipent iHMtrUtnx 
Ik 

Kuaber of 

respondents 

2/28 



t of tiae 
spent suaaarlzlny 
0 

Wuabcr of 

respondents 
0 



"Other" responses Include correspondence and press releases. 
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ffECTIOrJ C - PescrlPtlc»n/Si>»e«./WQr k St^t:«a#n t 

A. Thm following r»quir«»ent« and ooala moMt hm 
■at by tha offaror: * 

a. Wja dcvalopad translation taat viU ba 

tnmalationa akiUa 

of Individuals. 

b. Currantly translation a)cllU ara 
tastad by aaans of writtan tasta, 
which ara to ba translatad voxbatia 

f i-oa tha foraign languaga Into Engllah 
and froa English Into tha foraiSi 
languaga. iha various tasta vaxy in 
dif floilty aa wall as in fon and typa 
of oontant. ^ to ths tast for» and 
lack of daar, atandarditad aooring 
critaria, tha acoras tand to lack 
consistancv and hanca, raliabUity. 
tha taata lack aoaa ccntant validity, 
bacause they fall to aeasura auaaary 
translation aJcllla free audio atimili. 

c. Th* contractor Is to provide acori7»g 
criteria baMd on, and consiatant 
with, tha Interagency Languaoa 
Hfundtabla (lUi) level daacriptions , 
with a acala fro* 0 to 5. (Saa 
Attachment D for a copy of tha lut 
level deacriptiona for apaa)clng, 
listening, reading, and writing.) «je 
teat ahould ba constructad In auch a 

5i.'*="}tau aasy, but finely 
calibrated acoring, perhaps by Mans 
of specified point pinalty fS 
categoriea of errors, e.g. 
aistranslation, graaaaar, word choice, 
style, etc., with an axsct aasy to 
apply notation systea, vhich %«nUd 
ultimately result in a acore which can 
be converted to the 0 through 5 acale. 
A rating aheet to register error types 
and calibrations will be helpful for 
this purpose. 
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lt>« 69viop0d txanslttlon Utt should 
consist of an audio stlaulus to tsst 
mamMxy tnnslation sk^ll up to l«vcl 
3. to sstsbiiah a fleS, pSs a 
vrittsn stlBuIus to tast full, 
yarbatia translation cildlls batwaan 
i!Ti J* a* 5, to astabliah a 
cailln?. Aara should ba at laast ona 
altamata varsion of tha tast for 
ratasting purpoaas. 



Tha contractor vill ba abla to *<« 
•xtant draw on tha axpartlsa of tha 
»a«tar translators In tha PBI. and 
parsonnal fna tha FBI could also ba 
usad for tha audio portions of tha 
tast If dasix«d. «« 

Tha dasirad output should Includa a 
■odal and altamata In Ingliah, a»S 
Spanish tast plus an altainata and 
Poasitoly additional tasts In othaT 
l»ngu«gas, all of which ahould hava 
bwi fiald-tastad to provlda 
quMitlflabla data ragardlng 
reliability, validity, admlnlstraUve 
•AS* and acorability. «-"ve 

Upon coKplation of tha contract tha 
contractor will provlda writtan 
Inatructiona for tha grading of tha 
t*«ts and if nacasaaiy a tHlnlng 
a«««ion. ^ 

All matarlals ganaratad during tha 
coursa of tha rasaarch, Incliidiiw 
not«a and rough drafta, ara to ba 
tumad ovar to tha FBI. 
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Dellvrmblff 

«>• following art r«julr«j to bm 
fumlah«d: 

«. Konthly progfSB r«port« 

b. Translation mkill l«v«l d«acrlptlon» 

c. Audio ca«««tt«« with oral racordlnga 
of atijmll and approprlata 

docvaentatlon: 

(1) ona plus an altarnata In English 

(2) ona plus an altarnata In Spanish 

f . Bart coplas of wrlttan «ti«uli and 
appropriata docuaantatlon: 

(1) on« plua an altamata In English 

(2) ona plus an altamata in Spaniah 

g. Grading procadures, rating shaaU and 
•ppropriata training ■amial 

^' '^^M.^^ training at FBI, loth 
end Pannsylvania Avanua, K. w! 
Waahington, D. C. 
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